Skip to content

fix: add Qwen3.5 model context window tokens#12693

Open
alievrusik wants to merge 1 commit intodanny-avila:mainfrom
alievrusik:fix/add-qwen3.5-context-tokens
Open

fix: add Qwen3.5 model context window tokens#12693
alievrusik wants to merge 1 commit intodanny-avila:mainfrom
alievrusik:fix/add-qwen3.5-context-tokens

Conversation

@alievrusik
Copy link
Copy Markdown

Summary

  • Add qwen3.5 (262,144) and qwen3.5-397b (262,144) entries to the qwenModels token map

Problem

Qwen3.5 models (e.g. Qwen/Qwen3.5-397B-A17B-FP8) have a 262,144 token native context window, but were falling back to the generic qwen3 entry (40,960 tokens) via findMatchingPattern fuzzy name matching.

This caused the agent graph's pruneMessages (in @librechat/agents) to aggressively drop messages — including the user query, assistant tool_calls, and tool results — when tool output exceeded the undersized token budget (~36K tokens after the 0.9x scaling formula), leading to the model receiving only the system message and producing broken or empty responses.

Symptoms observed

  • Small tool outputs worked fine (~54KB / ~13K tokens fit within 36,864 budget)
  • Large tool outputs (~144KB / ~36K tokens) caused the entire conversation to be pruned
  • Model responded without context, ignoring tool results or producing "No user query found" errors
  • Same scenario worked correctly with Anthropic (which has its own token map entry)
  • Direct vLLM API calls with the same large payload worked correctly

Test plan

  • Verified Qwen/Qwen3.5-397B-A17B-FP8 now resolves to 262,144 tokens instead of 40,960
  • Tested with large tool output (~144KB) — all messages preserved, model responds correctly
  • Tested with small tool output — still works as before

Qwen3.5 models (e.g. Qwen/Qwen3.5-397B-A17B-FP8) have a 262,144 token
native context window, but were falling back to the generic `qwen3` entry
(40,960 tokens) via fuzzy name matching. This caused the agent graph's
pruneMessages to aggressively drop messages — including the user query,
assistant tool_calls, and tool results — when tool output exceeded the
undersized token budget, leading to the model receiving only the system
message and producing broken responses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant