Skip to content

feat: add vision capability flag to modelSpecs configuration#11501

Closed
JumpLink wants to merge 32 commits intodanny-avila:mainfrom
faktenforum:feat/vision
Closed

feat: add vision capability flag to modelSpecs configuration#11501
JumpLink wants to merge 32 commits intodanny-avila:mainfrom
faktenforum:feat/vision

Conversation

@JumpLink
Copy link
Copy Markdown

@JumpLink JumpLink commented Jan 24, 2026

Adds an optional vision boolean field to modelSpecs configuration to explicitly declare model vision support. This enables proper UI gating for image upload options based on model capabilities.

Related to: #11418 (partially addresses) and danny-avila/agents#48

Changes

  • Add vision?: boolean field to TModelSpec type and schema
  • Extend validateVisionModel() to check modelSpecs.vision first before fallback to hardcoded list
  • Create useVisionModel() hook to centralize vision model detection logic
  • Update UI components (DragDropModal, AttachFileMenu) to conditionally show image upload options based on model vision capability

Benefits

  • Enables proper UI gating: image upload options only appear for vision-capable models
  • Configuration-driven approach: model capabilities declared in librechat.yaml instead of hardcoded
  • Backward compatible: falls back to existing hardcoded list if modelSpecs not provided

Testing

  • Verify image upload options only appear for vision-capable models
  • Verify modelSpecs.vision configuration is respected
  • Verify fallback to hardcoded list works when modelSpecs not provided

- Add Scaleway to RECOGNIZED_PROVIDERS for improved MCP content formatting
- Add Scaleway detection for proper usage field handling (streamUsage: false, usage: true)
- Scaleway uses standard OpenAI reasoning_content format, no special handling needed

Scaleway custom endpoints are identified by endpoint name or baseURL containing 'scaleway' or 'api.scaleway.ai'.
LangChain may store usage data in response_metadata.usage instead of usage_metadata.
This change checks both locations and converts LangChain format to the expected format
when token data is present.

This improves compatibility with custom endpoints that use LangChain internally.
- Add Scaleway to RECOGNIZED_PROVIDERS for improved MCP content
formatting
- Add Scaleway detection for proper usage field handling (streamUsage:
false, usage: true)
- Scaleway uses standard OpenAI reasoning_content format, no special
handling needed

Scaleway custom endpoints are identified by endpoint name or baseURL
containing 'scaleway' or 'api.scaleway.ai'.
- Generalize custom endpoint detection for usage field handling
  - Replace provider-specific checks with generic isCustomOpenAIEndpoint function
  - Automatically handles all custom endpoints (provider=OPENAI but endpoint name differs)
  - Removes need for explicit provider additions

- Improve MCP content formatting for custom endpoints
  - Add isRecognizedProvider helper function for clarity
  - Custom endpoints automatically recognized since they use 'openai' provider
  - Helps address MCP tool response formatting issues (LibreChat danny-avila#11494)

This change benefits all OpenAI-compatible custom endpoints, not just specific providers,
making the codebase more maintainable and reducing the need for provider-specific additions.
- Removed redundant checks for usage data in LangChain responses, consolidating the logic to directly access usage_metadata.
- This change streamlines the code and improves readability while maintaining functionality.
- Add isCustomOpenAIEndpoint function to automatically detect custom endpoints
  for proper usage field handling (provider=OPENAI but endpoint name differs)
- Add Scaleway to RECOGNIZED_PROVIDERS for MCP content formatting
- Improves handling of MCP tool responses with structured content formatting

This change benefits all OpenAI-compatible custom endpoints by automatically
detecting them for usage field handling, while MCP formatting requires explicit
provider additions since custom endpoints are passed with their endpoint name.
- Add isCustomOpenAIEndpoint function to automatically detect custom endpoints
  for proper usage field handling (provider=OPENAI but endpoint name differs)
- Add Scaleway to RECOGNIZED_PROVIDERS for MCP content formatting
- Improves handling of MCP tool responses with structured content formatting

This change benefits all OpenAI-compatible custom endpoints by automatically
detecting them for usage field handling, while MCP formatting requires explicit
provider additions since custom endpoints are passed with their endpoint name.
Add `vision` boolean field to modelSpecs configuration to explicitly
declare model vision support. This enables proper filtering of image
artifacts for non-vision models and UI gating for image upload options.

- Add vision field to TModelSpec type/schema
- Extend validateVisionModel() to check modelSpecs first
- Pass modelSpecs from API to agents package
- Update UI components to use vision capability check
- Removed direct calls to validateVisionModel in AttachFileMenu and DragDropModal components.
- Introduced useVisionModel hook to encapsulate vision model validation logic.
- Updated imports to reflect the new hook usage, improving code modularity and readability.
- Remove modelSpecs parameter from createRun() function
- Remove modelSpecs conversion logic (handled by agent-level vision toggle)
- Remove modelSpecs from createRun() call in client.js
- This keeps PR 11501 focused on modelSpecs vision for UI gating only
JumpLink added a commit to faktenforum/LibreChat that referenced this pull request Jan 24, 2026
- Add vision to AgentCapabilities enum and default capabilities
- Add vision?: boolean field to Agent type and validation schema
- Add vision toggle UI component for agents with hover card and info description
- Include vision in agent create/update payload
- Pass vision from agent to AgentInputs in run API

Depends on PR danny-avila#11501 (modelSpecs vision) for validateVisionModel function
@JumpLink JumpLink marked this pull request as ready for review January 24, 2026 18:05
Automatically recognize and format MCP tool responses for all OpenAI-compatible
custom endpoints without requiring explicit additions. Uses negative list
(NON_OPENAI_PROVIDERS) instead of maintaining positive list for each new provider.
@JumpLink JumpLink marked this pull request as draft January 25, 2026 06:25
JumpLink added a commit to faktenforum/LibreChat that referenced this pull request Jan 26, 2026
- Add vision to AgentCapabilities enum and default capabilities
- Add vision?: boolean field to Agent type and validation schema
- Add vision toggle UI component for agents with hover card and info description
- Include vision in agent create/update payload
- Pass vision from agent to AgentInputs in run API

Depends on PR danny-avila#11501 (modelSpecs vision) for validateVisionModel function
- Add vision to AgentCapabilities enum and default capabilities
- Add vision?: boolean field to Agent type and validation schema
- Add vision toggle UI component for agents with hover card and info description
- Include vision in agent create/update payload
- Pass vision from agent to AgentInputs in run API

Depends on PR danny-avila#11501 (modelSpecs vision) for validateVisionModel function
- Implement automatic detection of vision capability based on model specifications.
- Update agent configuration to auto-set vision based on model changes.
- Introduce artifact processing for MCP tools, ensuring proper handling of image URLs and base64 data.
- Refactor related components to utilize new vision validation logic and improve modularity.
- Update UI elements to reflect changes in vision capability handling and provide clearer user guidance.
- Removed unnecessary blank lines in the createToolEndCallback function to improve code readability and maintainability.
- Remove redundant result processing in MCP.js - formatToolContent already returns correct tuple format
- Add debug logging in run.ts to diagnose vision capability detection issues
- Improve code clarity by removing workaround code
- Change MCP.js to directly return the result from mcpManager.callTool, enhancing clarity.
- Remove console logs in run.ts related to vision capability detection to streamline the code.
- Update AgentClient to conditionally handle image URLs and attachments based on vision capability.
- Modify AssistantService to check for image file types before processing artifact messages.
- Refactor ToolService to improve vision capability validation and ensure proper handling of artifacts for non-vision models.
- Clarify documentation regarding vision capabilities and processing behavior for better understanding.
…processing

- Simplify image URL handling in AgentClient by removing unnecessary checks when vision is disabled.
- Enhance AssistantService to use a boolean flag for determining if file IDs should be attached to artifact messages.
- Add validation for max_tokens in createRun to ensure it is always set to a valid value, preventing potential errors from invalid configurations.
…nt models

- Update loadEphemeralAgent and loadAddedAgent functions to prioritize model specifications for vision and spec attributes.
- Modify determineVisionCapability to incorporate spec-based vision detection, improving clarity and functionality.
- Refactor createRun to ensure valid max_tokens handling, enhancing robustness against invalid configurations.
@JumpLink JumpLink marked this pull request as ready for review February 11, 2026 10:19
…ents

- Enhanced the agent client to validate vision capabilities based on agent settings and model specifications.
- Updated AttachFileMenu and DragDropModal components to utilize the new vision capability checks, ensuring proper handling of image uploads.
- Introduced visionEnabledByAgent in useAgentToolPermissions hook to streamline permission checks across components.
# Conflicts:
#	api/models/Agent.js
#	api/models/loadAddedAgent.js
#	api/server/controllers/agents/client.js
#	api/server/services/MCP.js
#	api/server/services/ToolService.js
#	packages/api/src/agents/run.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants