feat: add responseModalities support for Gemini image generation by usnavy13 · Pull Request #41 · danny-avila/agents

usnavy13 · 2025-12-13T20:21:37Z

Summary

Add support for the responseModalities parameter in Google and VertexAI LLM classes to enable native Gemini image generation models (gemini-2.5-flash-image, gemini-3-pro-image-preview, etc.) to return images alongside text.

Changes

Add responseModalities?: ('TEXT' | 'IMAGE' | 'AUDIO')[] to GoogleClientOptions and VertexAIClientOptions types
Pass responseModalities to generationConfig in CustomChatGoogleGenerativeAI constructor
Handle inlineData (image) parts in response processing (convertResponseContentToChatGenerationChunk and mapGenerateContentResultToChatResult), converting them to image_url content blocks with base64 data URLs
Add responseModalities to generation config in VertexAI CustomChatConnection.formatData

Usage

const llmConfig = {
  provider: Providers.GOOGLE,
  model: 'gemini-2.5-flash-image',
  responseModalities: ['TEXT', 'IMAGE'],
};

Tested Models

gemini-2.5-flash-image - Returns text + PNG image
gemini-3-pro-image-preview - Returns JPEG image

Add support for the `responseModalities` parameter in Google and VertexAI LLM classes to enable native Gemini image generation models to return images alongside text. Changes: - Add `responseModalities` to `GoogleClientOptions` and `VertexAIClientOptions` types - Pass `responseModalities` to `generationConfig` in `CustomChatGoogleGenerativeAI` - Handle `inlineData` (image) parts in response processing, converting to `image_url` content blocks - Add `responseModalities` to generation config in VertexAI `CustomChatConnection.formatData` This enables models like `gemini-2.5-flash-image` and `gemini-3-pro-image-preview` to generate and return images when `responseModalities: ['TEXT', 'IMAGE']` is set.

Audit verified: all 5 valid (real bypass shapes / API contract violations / OOM risks). P1 #37 — destructive path normalization. Patterns like \`rm -rf \$HOME/\`, \`rm -rf ~/\`, \`rm -rf "\$HOME/"\` slipped past the bare + quoted destructive guards because the trailing slash broke the end-anchor / quote-pair shapes. Extracted a shared \`DESTRUCTIVE_TARGET\` (\`(?:\\/|~|\\\$\\{?HOME\\}?|\\.)\\/?\`) used by both pattern lists so spelling equivalences are kept consistent. 9 tests pinned (all the spelling variants + a benign no-regression). P2 #38 — \`resolveWorkspacePathSafe\` used host \`fs/promises.realpath\` instead of the configured \`WorkspaceFS.realpath\`. On a custom or remote engine the host realpath would fail and silently fall back to lexical containment, leaving the symlink-escape clamp ineffective. \`realpathOrSelf\` and \`realpathOfPathOrAncestor\` now take the realpath impl as a parameter; \`resolveWorkspacePathSafe\` threads \`getWorkspaceFS(config).realpath\` through both. P2 #39 — direct-path \`additionalContexts\` were silently swallowed. Hosts that returned \`additionalContext\` from PreToolUse / PostToolUse / PostToolUseFailure for direct tools (which is every local-engine tool) had their context discarded — broken hook API contract. Added \`RunToolBatchContext.additionalContextsSink\`; \`runDirectToolWithLifecycleHooks\` pushes hook contexts into it; \`run()\` materializes the accumulated strings as a single \`HumanMessage\` appended to outputs, matching the event-driven path's \`injected[]\` shape. PostToolUseFailure was also changed from fire-and-forget to await so its contexts are captured (the hook is still observational w.r.t. the tool result). P2 #40 — syntax-check probe cache was keyed only on spawn backend. Same shape as P1 #34 (rg cache). Now nested \`WeakMap<spawn, Map<envHash, ProbeCache>>\`. Stable JSON over sorted env entries. P2 #41 — fallback grep read each candidate file fully via \`readFile\` then \`split('\\n')\`. The wall-clock budget only checked between files, so a single multi-GB log could OOM the process even with the regex DoS guards in place. Added a per-file \`FALLBACK_GREP_MAX_FILE_BYTES = 5 MiB\` cap (stat first, skip with sentinel if oversize) plus a deadline re-check after each read. Hosts needing larger files should install ripgrep. 830 tests passing across all suites (was 817), lint baseline unchanged.

usnavy13 mentioned this pull request Dec 13, 2025

🍌 feat: Gemini Image Generation Tool (Nano Banana) danny-avila/LibreChat#10676

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add responseModalities support for Gemini image generation#41

feat: add responseModalities support for Gemini image generation#41
usnavy13 wants to merge 1 commit intodanny-avila:mainfrom
usnavy13:feat/response-modalities

usnavy13 commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

usnavy13 commented Dec 13, 2025

Summary

Changes

Usage

Tested Models

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant