Skip to content

feat: add responseModalities support for Gemini image generation#41

Open
usnavy13 wants to merge 1 commit intodanny-avila:mainfrom
usnavy13:feat/response-modalities
Open

feat: add responseModalities support for Gemini image generation#41
usnavy13 wants to merge 1 commit intodanny-avila:mainfrom
usnavy13:feat/response-modalities

Conversation

@usnavy13
Copy link
Copy Markdown

Summary

Add support for the responseModalities parameter in Google and VertexAI LLM classes to enable native Gemini image generation models (gemini-2.5-flash-image, gemini-3-pro-image-preview, etc.) to return images alongside text.

Changes

  • Add responseModalities?: ('TEXT' | 'IMAGE' | 'AUDIO')[] to GoogleClientOptions and VertexAIClientOptions types
  • Pass responseModalities to generationConfig in CustomChatGoogleGenerativeAI constructor
  • Handle inlineData (image) parts in response processing (convertResponseContentToChatGenerationChunk and mapGenerateContentResultToChatResult), converting them to image_url content blocks with base64 data URLs
  • Add responseModalities to generation config in VertexAI CustomChatConnection.formatData

Usage

const llmConfig = {
  provider: Providers.GOOGLE,
  model: 'gemini-2.5-flash-image',
  responseModalities: ['TEXT', 'IMAGE'],
};

Tested Models

  • gemini-2.5-flash-image - Returns text + PNG image
  • gemini-3-pro-image-preview - Returns JPEG image

Related

Add support for the `responseModalities` parameter in Google and VertexAI
LLM classes to enable native Gemini image generation models to return
images alongside text.

Changes:
- Add `responseModalities` to `GoogleClientOptions` and `VertexAIClientOptions` types
- Pass `responseModalities` to `generationConfig` in `CustomChatGoogleGenerativeAI`
- Handle `inlineData` (image) parts in response processing, converting to `image_url` content blocks
- Add `responseModalities` to generation config in VertexAI `CustomChatConnection.formatData`

This enables models like `gemini-2.5-flash-image` and `gemini-3-pro-image-preview`
to generate and return images when `responseModalities: ['TEXT', 'IMAGE']` is set.
danny-avila added a commit that referenced this pull request May 5, 2026
Audit verified: all 5 valid (real bypass shapes / API contract
violations / OOM risks).

P1 #37 — destructive path normalization. Patterns like \`rm -rf
\$HOME/\`, \`rm -rf ~/\`, \`rm -rf "\$HOME/"\` slipped past the bare
+ quoted destructive guards because the trailing slash broke the
end-anchor / quote-pair shapes. Extracted a shared
\`DESTRUCTIVE_TARGET\` (\`(?:\\/|~|\\\$\\{?HOME\\}?|\\.)\\/?\`) used
by both pattern lists so spelling equivalences are kept consistent.
9 tests pinned (all the spelling variants + a benign no-regression).

P2 #38 — \`resolveWorkspacePathSafe\` used host \`fs/promises.realpath\`
instead of the configured \`WorkspaceFS.realpath\`. On a custom or
remote engine the host realpath would fail and silently fall back
to lexical containment, leaving the symlink-escape clamp
ineffective. \`realpathOrSelf\` and \`realpathOfPathOrAncestor\`
now take the realpath impl as a parameter; \`resolveWorkspacePathSafe\`
threads \`getWorkspaceFS(config).realpath\` through both.

P2 #39 — direct-path \`additionalContexts\` were silently swallowed.
Hosts that returned \`additionalContext\` from PreToolUse /
PostToolUse / PostToolUseFailure for direct tools (which is every
local-engine tool) had their context discarded — broken hook API
contract. Added \`RunToolBatchContext.additionalContextsSink\`;
\`runDirectToolWithLifecycleHooks\` pushes hook contexts into it;
\`run()\` materializes the accumulated strings as a single
\`HumanMessage\` appended to outputs, matching the event-driven
path's \`injected[]\` shape. PostToolUseFailure was also changed
from fire-and-forget to await so its contexts are captured (the
hook is still observational w.r.t. the tool result).

P2 #40 — syntax-check probe cache was keyed only on spawn backend.
Same shape as P1 #34 (rg cache). Now nested
\`WeakMap<spawn, Map<envHash, ProbeCache>>\`. Stable JSON over
sorted env entries.

P2 #41 — fallback grep read each candidate file fully via
\`readFile\` then \`split('\\n')\`. The wall-clock budget only
checked between files, so a single multi-GB log could OOM the
process even with the regex DoS guards in place. Added a per-file
\`FALLBACK_GREP_MAX_FILE_BYTES = 5 MiB\` cap (stat first, skip with
sentinel if oversize) plus a deadline re-check after each read.
Hosts needing larger files should install ripgrep.

830 tests passing across all suites (was 817), lint baseline unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant