fix: handle KeyError and missing text_config in vision loader#2055
Open
Drifter4242 wants to merge 1 commit into
Open
fix: handle KeyError and missing text_config in vision loader#2055Drifter4242 wants to merge 1 commit into
Drifter4242 wants to merge 1 commit into
Conversation
Two crash bugs when loading vision weights from a vision-only repo (e.g. exolabs/Kimi-K2.6-vision) that has a stripped-down config.json: 1. KeyError in load_image_processor: mlx_vlm's load_image_processor() looks up config['model_type'] and throws KeyError when the key is absent. Vision-only repos intentionally omit model_type since they only contain projection weights, not a full model config. The except clause only caught ValueError, so KeyError propagated uncaught. Fix: except (ValueError, KeyError) so the fallback module-based loader is used instead. 2. Projector shape mismatch: TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small value (e.g. 0 or 256), so the projector linear layer was built with the wrong output dim and failed at weight load or first forward pass. Fix: when text_config is empty, read text_hidden_size from vision_config (where exolabs repos embed it) and s vision_config (where exolabs repos embed it) and s visize.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When using the correct processor_repo for the vision (see #2054), there needs to be some small adjustments for reading the config.json.
The rest is written by Sonnet (reviewed be me):
Two crash bugs when loading vision weights from a vision-only repo (e.g. exolabs/Kimi-K2.6-vision) that has a stripped-down config.json:
KeyError in load_image_processor: mlx_vlm's load_image_processor() looks up config['model_type'] and throws KeyError when the key is absent. Vision-only repos intentionally omit model_type since they only contain projection weights, not a full model config. The except clause only caught ValueError, so KeyError propagated uncaught. Fix: except (ValueError, KeyError) so the fallback module-based loader is used instead.
Projector shape mismatch: TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small value (e.g. 0 or 256), so the projector linear layer was built with the wrong output dim and failed at weight load or first forward pass. Fix: when text_config is empty, read text_hidden_size from
vision_config (where exolabs repos embed it) and s vision_config (where exolabs repos embed it) and s visize.
Motivation
Two crash bugs when loading a vision pipeline from a vision-only weights repo (e.g.
exolabs/Kimi-K2.6-vision). These repos contain only projection weights and a stripped-downconfig.json— they intentionally omit fields that only make sense for a full model checkpoint.Changes
1.
KeyErrorinload_image_processormlx_vlm'sload_image_processor()looks upconfig['model_type']and raisesKeyErrorwhen absent. Theexceptclause only caughtValueError, soKeyErrorpropagated uncaught and crashed the vision pipeline before the fallback module-based loader could run.2. Projector shape mismatch
TextConfigwas constructed fromconfig['text_config'], which is empty in vision-only repos. This causedTextConfigto defaulthidden_sizeto a small/zero value, so the projector linear layer was built with the wrong output dimension and failed at weight loading or first forward pass.Some vision repos (e.g.
exolabs/Kimi-K2.6-vision) embedtext_hidden_sizeinsidevision_configinstead. We now fall back to reading it from there.Why It Works
Test Plan
Manual Testing
Hardware: 2× Mac Studio M3 Ultra 512 GB, Thunderbolt 5 direct bridge,
MlxJacclRDMA tensor-parallel (moonshotai/Kimi-K2.6+exolabs/Kimi-K2.6-vision).Automated Testing
pytest src -m "not slow" --import-mode=importlib— all tests pass.