Skip to content

fix(finetune): apply train.max_norm gradient clipping in finetune scripts#2264

Open
discobot wants to merge 1 commit into
Lightning-AI:mainfrom
discobot:fix/2191-finetune-max-norm
Open

fix(finetune): apply train.max_norm gradient clipping in finetune scripts#2264
discobot wants to merge 1 commit into
Lightning-AI:mainfrom
discobot:fix/2191-finetune-max-norm

Conversation

@discobot

Copy link
Copy Markdown

Fixes #2191.

--train.max_norm was rejected by validate_args in all four finetune scripts while the
finetune configs in config_hub expose it.

This removes max_norm from the unsupported list in lora.py, full.py, adapter.py, and
adapter_v2.py and applies fabric.clip_gradients(model, optimizer, max_norm=train.max_norm)
at optimizer-step boundaries (inside if not is_accumulating:), mirroring pretrain.py.
Clipping at step boundaries rather than after each fabric.backward() keeps it correct under
gradient accumulation, where per-micro-batch clipping would act on partial gradients and
conflict with no_backward_sync.

Unlike pretrain, max_norm remains optional: when unset (the default in every finetune config)
behavior is unchanged, so the existing null max_norm: entries in config_hub/finetune/* now
genuinely mean "no clipping" and need no edits. QLoRA is covered via the same lora.py path.

Adds a regression test per script that runs the tiny-config CPU training loop with
--train.max_norm=1.0 and asserts Fabric.clip_gradients is invoked once per optimizer step
with the configured value (these previously failed with
ValueError: ... doesn't support the 'max_norm' argument).

…ipts

The four finetune scripts (lora, full, adapter, adapter_v2) listed
train.max_norm as unsupported in validate_args even though the finetune
configs in config_hub expose it. Remove it from the unsupported lists and
clip gradients at optimizer-step boundaries with fabric.clip_gradients,
mirroring pretrain.py. The argument remains optional, so behavior is
unchanged when it is not set. Add regression tests asserting that
clip_gradients is called once per optimizer step with the configured value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gradient Clipping Doesn't Work in Finetuning

1 participant