fix: handle KeyError and missing text_config in vision loader by Drifter4242 · Pull Request #2055 · exo-explore/exo

Drifter4242 · 2026-05-06T04:24:55Z

When using the correct processor_repo for the vision (see #2054), there needs to be some small adjustments for reading the config.json.

The rest is written by Sonnet (reviewed be me):

Two crash bugs when loading vision weights from a vision-only repo (e.g. exolabs/Kimi-K2.6-vision) that has a stripped-down config.json:

KeyError in load_image_processor: mlx_vlm's load_image_processor() looks up config['model_type'] and throws KeyError when the key is absent. Vision-only repos intentionally omit model_type since they only contain projection weights, not a full model config. The except clause only caught ValueError, so KeyError propagated uncaught. Fix: except (ValueError, KeyError) so the fallback module-based loader is used instead.
Projector shape mismatch: TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small value (e.g. 0 or 256), so the projector linear layer was built with the wrong output dim and failed at weight load or first forward pass. Fix: when text_config is empty, read text_hidden_size from
vision_config (where exolabs repos embed it) and s vision_config (where exolabs repos embed it) and s visize.

Motivation

Two crash bugs when loading a vision pipeline from a vision-only weights repo (e.g. exolabs/Kimi-K2.6-vision). These repos contain only projection weights and a stripped-down config.json — they intentionally omit fields that only make sense for a full model checkpoint.

Changes

1. KeyError in load_image_processor

mlx_vlm's load_image_processor() looks up config['model_type'] and raises KeyError when absent. The except clause only caught ValueError, so KeyError propagated uncaught and crashed the vision pipeline before the fallback module-based loader could run.

2. Projector shape mismatch

TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small/zero value, so the projector linear layer was built with the wrong output dimension and failed at weight loading or first forward pass.

Some vision repos (e.g. exolabs/Kimi-K2.6-vision) embed text_hidden_size inside vision_config instead. We now fall back to reading it from there.

Why It Works

# Before
except ValueError:
    image_proc = None

# After
except (ValueError, KeyError):
    # KeyError raised by mlx_vlm when vision-only repo config.json lacks model_type
    image_proc = None

# Before
text_config = config_mod.TextConfig(
    **_filter_config(config_mod.TextConfig, config.get("text_config", {}))
)

# After
text_config_dict = dict(config.get("text_config", {}))
if not text_config_dict:
    text_hidden_size = vision_cfg.get("text_hidden_size")
    if text_hidden_size:
        text_config_dict = {"hidden_size": int(text_hidden_size)}
text_config = config_mod.TextConfig(
    **_filter_config(config_mod.TextConfig, text_config_dict)
)

Test Plan

Manual Testing

Hardware: 2× Mac Studio M3 Ultra 512 GB, Thunderbolt 5 direct bridge, MlxJaccl RDMA tensor-parallel (moonshotai/Kimi-K2.6 + exolabs/Kimi-K2.6-vision).

Vision pipeline loads without error.
Image requests return correct responses.

Automated Testing

pytest src -m "not slow" --import-mode=importlib — all tests pass.

Two crash bugs when loading vision weights from a vision-only repo (e.g. exolabs/Kimi-K2.6-vision) that has a stripped-down config.json: 1. KeyError in load_image_processor: mlx_vlm's load_image_processor() looks up config['model_type'] and throws KeyError when the key is absent. Vision-only repos intentionally omit model_type since they only contain projection weights, not a full model config. The except clause only caught ValueError, so KeyError propagated uncaught. Fix: except (ValueError, KeyError) so the fallback module-based loader is used instead. 2. Projector shape mismatch: TextConfig was constructed from config['text_config'], which is empty in vision-only repos. This caused TextConfig to default hidden_size to a small value (e.g. 0 or 256), so the projector linear layer was built with the wrong output dim and failed at weight load or first forward pass. Fix: when text_config is empty, read text_hidden_size from vision_config (where exolabs repos embed it) and s vision_config (where exolabs repos embed it) and s visize.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle KeyError and missing text_config in vision loader#2055

fix: handle KeyError and missing text_config in vision loader#2055
Drifter4242 wants to merge 1 commit into
exo-explore:mainfrom
Drifter4242:pr/fix-vision-loader-errors

Drifter4242 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Drifter4242 commented May 6, 2026

Motivation

Changes

Why It Works

Test Plan

Manual Testing

Automated Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant