fix(finetune): apply train.max_norm gradient clipping in finetune scripts#2264
Open
discobot wants to merge 1 commit into
Open
fix(finetune): apply train.max_norm gradient clipping in finetune scripts#2264discobot wants to merge 1 commit into
discobot wants to merge 1 commit into
Conversation
…ipts The four finetune scripts (lora, full, adapter, adapter_v2) listed train.max_norm as unsupported in validate_args even though the finetune configs in config_hub expose it. Remove it from the unsupported lists and clip gradients at optimizer-step boundaries with fabric.clip_gradients, mirroring pretrain.py. The argument remains optional, so behavior is unchanged when it is not set. Add regression tests asserting that clip_gradients is called once per optimizer step with the configured value.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #2191.
--train.max_normwas rejected byvalidate_argsin all four finetune scripts while thefinetune configs in
config_hubexpose it.This removes
max_normfrom the unsupported list inlora.py,full.py,adapter.py, andadapter_v2.pyand appliesfabric.clip_gradients(model, optimizer, max_norm=train.max_norm)at optimizer-step boundaries (inside
if not is_accumulating:), mirroringpretrain.py.Clipping at step boundaries rather than after each
fabric.backward()keeps it correct undergradient accumulation, where per-micro-batch clipping would act on partial gradients and
conflict with
no_backward_sync.Unlike pretrain,
max_normremains optional: when unset (the default in every finetune config)behavior is unchanged, so the existing null
max_norm:entries inconfig_hub/finetune/*nowgenuinely mean "no clipping" and need no edits. QLoRA is covered via the same
lora.pypath.Adds a regression test per script that runs the tiny-config CPU training loop with
--train.max_norm=1.0and assertsFabric.clip_gradientsis invoked once per optimizer stepwith the configured value (these previously failed with
ValueError: ... doesn't support the 'max_norm' argument).