Skip to content

fix: handle KeyError and missing text_config in vision loader#2055

Open
Drifter4242 wants to merge 1 commit into
exo-explore:mainfrom
Drifter4242:pr/fix-vision-loader-errors
Open

fix: handle KeyError and missing text_config in vision loader#2055
Drifter4242 wants to merge 1 commit into
exo-explore:mainfrom
Drifter4242:pr/fix-vision-loader-errors

Conversation

@Drifter4242
Copy link
Copy Markdown
Contributor

When using the correct processor_repo for the vision (see #2054), there needs to be some small adjustments for reading the config.json.

The rest is written by Sonnet (reviewed be me):

Two crash bugs when loading vision weights from a vision-only repo (e.g. exolabs/Kimi-K2.6-vision) that has a stripped-down config.json:

  1. KeyError in load_image_processor: mlx_vlm's load_image_processor() looks up config['model_type'] and throws KeyError when the key is absent. Vision-only repos intentionally omit model_type since they only contain projection weights, not a full model config. The except clause only caught ValueError, so KeyError propagated uncaught. Fix: except (ValueError, KeyError) so the fallback module-based loader is used instead.

  2. Projector shape mismatch: TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small value (e.g. 0 or 256), so the projector linear layer was built with the wrong output dim and failed at weight load or first forward pass. Fix: when text_config is empty, read text_hidden_size from
    vision_config (where exolabs repos embed it) and s vision_config (where exolabs repos embed it) and s visize.

Motivation

Two crash bugs when loading a vision pipeline from a vision-only weights repo (e.g. exolabs/Kimi-K2.6-vision). These repos contain only projection weights and a stripped-down config.json — they intentionally omit fields that only make sense for a full model checkpoint.

Changes

1. KeyError in load_image_processor

mlx_vlm's load_image_processor() looks up config['model_type'] and raises KeyError when absent. The except clause only caught ValueError, so KeyError propagated uncaught and crashed the vision pipeline before the fallback module-based loader could run.

2. Projector shape mismatch

TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small/zero value, so the projector linear layer was built with the wrong output dimension and failed at weight loading or first forward pass.

Some vision repos (e.g. exolabs/Kimi-K2.6-vision) embed text_hidden_size inside vision_config instead. We now fall back to reading it from there.

Why It Works

# Before
except ValueError:
    image_proc = None

# After
except (ValueError, KeyError):
    # KeyError raised by mlx_vlm when vision-only repo config.json lacks model_type
    image_proc = None
# Before
text_config = config_mod.TextConfig(
    **_filter_config(config_mod.TextConfig, config.get("text_config", {}))
)

# After
text_config_dict = dict(config.get("text_config", {}))
if not text_config_dict:
    text_hidden_size = vision_cfg.get("text_hidden_size")
    if text_hidden_size:
        text_config_dict = {"hidden_size": int(text_hidden_size)}
text_config = config_mod.TextConfig(
    **_filter_config(config_mod.TextConfig, text_config_dict)
)

Test Plan

Manual Testing

Hardware: 2× Mac Studio M3 Ultra 512 GB, Thunderbolt 5 direct bridge, MlxJaccl RDMA tensor-parallel (moonshotai/Kimi-K2.6 + exolabs/Kimi-K2.6-vision).

  • Vision pipeline loads without error.
  • Image requests return correct responses.

Automated Testing

pytest src -m "not slow" --import-mode=importlib — all tests pass.

Two crash bugs when loading vision weights from a vision-only repo
(e.g. exolabs/Kimi-K2.6-vision) that has a stripped-down config.json:

1. KeyError in load_image_processor: mlx_vlm's load_image_processor()
   looks up config['model_type'] and throws KeyError when the key is
   absent. Vision-only repos intentionally omit model_type since they
   only contain projection weights, not a full model config. The except
   clause only caught ValueError, so KeyError propagated uncaught.
   Fix: except (ValueError, KeyError) so the fallback module-based
   loader is used instead.

2. Projector shape mismatch: TextConfig was constructed from
   config['text_config'], which is empty in vision-only repos. This
   caused TextConfig to default hidden_size to a small value (e.g. 0 or
   256), so the projector linear layer was built with the wrong output
   dim and failed at weight load or first forward pass.
   Fix: when text_config is empty, read text_hidden_size from
   vision_config (where exolabs repos embed it) and s   vision_config (where exolabs repos embed it) and s   visize.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant