Skip to content

[bug] Unable to freeze specific layers of a pretrained model #394

@RaymondLi0

Description

@RaymondLi0

🐞 Describe the Bug

I'm trying to freeze specific layers of a pretrained model (for example only layer 0).
The problem is that loading a pretrained model like Apriel-Thinker will load its decoder config as a FixedBlockSequenceConfig. However I need to pass a block-pattern config to freeze only certain layers. e.g.

decoder:
  type: pattern
  pattern:
    - train_block
    - freeze_block
    - freeze_block
  blocks:
    train_block:
      mlp:
        lr_scale: 1.e-12
    freeze_block:
      lr_scale: 1.e-12

We currently cannot reconcile these two configs.
So the solution would be to prevent loading the pretrained config with load_config: none, and re-pass the entire block config.
However this does not work currently because some type parameters are creeping into the decoder config:

'!!! block':
  type: decoder
  mixer:
    type: attention
    rotary:
      type: none
  mlp:
    type: mlp
  normalization:
    type: layer_norm

🔄 Steps to Reproduce

Steps to reproduce the behavior:

  1. Fast-LLM version: https://github.com/ServiceNow/Fast-LLM/tree/b7c0de61662c61e83c617bd8157d0bf9426e3d52
  2. Train with the following config
pretrained:
  format: mistral
  path: /mnt/checkpoints/upstream/Apriel-Nemotron-15b-Thinker-reinit-attn-layer-0
  load_config: none
model:
  base_model:
    decoder:
        type: pattern
        pattern:
          - train_block
          - freeze_block
          - freeze_block
          - freeze_block
        blocks:
          train_block:
            mlp:
              lr_scale: 1.e-12
          freeze_block:
            lr_scale: 1.e-12
  type: gpt
  1. Fails during config validation with
fast_llm.config.NestedValidationError: Validation failed for field `model` of type `fast_llm.models.gpt.config.GPTModelConfig` in class fast_llm.models.gpt.config.GPTTrainerConfig:
  Validation failed for field `base_model` of type `fast_llm.models.gpt.config.GPTBaseModelConfig` in class fast_llm.models.gpt.config.GPTModelConfig:
    Validation failed for field `decoder` of type `fast_llm.layers.block.config.BlockSequenceConfig` in class fast_llm.models.gpt.config.GPTBaseModelConfig:
      Unknown field `block` in class fast_llm.layers.block.config.PatternBlockSequenceConfig

The decoder config would look like:

    decoder:
      type: pattern
      blocks:
        train_block:
          [...]
        freeze_block:
          [...]
      pattern:
      - train_block
      - freeze_block
      - freeze_block
      - freeze_block
      num_blocks: 50
      '!!! block':   <--- undesired entry coming from the pretrained checkpoint
        type: decoder
        mixer:
          type: attention
          rotary:
            type: none
        mlp:
          type: mlp
        normalization:
          type: layer_norm

🎯 Expected Behavior

Should only load current config

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions