feat: add responseModalities support for Gemini image generation#41
Open
usnavy13 wants to merge 1 commit intodanny-avila:mainfrom
Open
feat: add responseModalities support for Gemini image generation#41usnavy13 wants to merge 1 commit intodanny-avila:mainfrom
usnavy13 wants to merge 1 commit intodanny-avila:mainfrom
Conversation
Add support for the `responseModalities` parameter in Google and VertexAI LLM classes to enable native Gemini image generation models to return images alongside text. Changes: - Add `responseModalities` to `GoogleClientOptions` and `VertexAIClientOptions` types - Pass `responseModalities` to `generationConfig` in `CustomChatGoogleGenerativeAI` - Handle `inlineData` (image) parts in response processing, converting to `image_url` content blocks - Add `responseModalities` to generation config in VertexAI `CustomChatConnection.formatData` This enables models like `gemini-2.5-flash-image` and `gemini-3-pro-image-preview` to generate and return images when `responseModalities: ['TEXT', 'IMAGE']` is set.
9 tasks
danny-avila
added a commit
that referenced
this pull request
May 5, 2026
Audit verified: all 5 valid (real bypass shapes / API contract violations / OOM risks). P1 #37 — destructive path normalization. Patterns like \`rm -rf \$HOME/\`, \`rm -rf ~/\`, \`rm -rf "\$HOME/"\` slipped past the bare + quoted destructive guards because the trailing slash broke the end-anchor / quote-pair shapes. Extracted a shared \`DESTRUCTIVE_TARGET\` (\`(?:\\/|~|\\\$\\{?HOME\\}?|\\.)\\/?\`) used by both pattern lists so spelling equivalences are kept consistent. 9 tests pinned (all the spelling variants + a benign no-regression). P2 #38 — \`resolveWorkspacePathSafe\` used host \`fs/promises.realpath\` instead of the configured \`WorkspaceFS.realpath\`. On a custom or remote engine the host realpath would fail and silently fall back to lexical containment, leaving the symlink-escape clamp ineffective. \`realpathOrSelf\` and \`realpathOfPathOrAncestor\` now take the realpath impl as a parameter; \`resolveWorkspacePathSafe\` threads \`getWorkspaceFS(config).realpath\` through both. P2 #39 — direct-path \`additionalContexts\` were silently swallowed. Hosts that returned \`additionalContext\` from PreToolUse / PostToolUse / PostToolUseFailure for direct tools (which is every local-engine tool) had their context discarded — broken hook API contract. Added \`RunToolBatchContext.additionalContextsSink\`; \`runDirectToolWithLifecycleHooks\` pushes hook contexts into it; \`run()\` materializes the accumulated strings as a single \`HumanMessage\` appended to outputs, matching the event-driven path's \`injected[]\` shape. PostToolUseFailure was also changed from fire-and-forget to await so its contexts are captured (the hook is still observational w.r.t. the tool result). P2 #40 — syntax-check probe cache was keyed only on spawn backend. Same shape as P1 #34 (rg cache). Now nested \`WeakMap<spawn, Map<envHash, ProbeCache>>\`. Stable JSON over sorted env entries. P2 #41 — fallback grep read each candidate file fully via \`readFile\` then \`split('\\n')\`. The wall-clock budget only checked between files, so a single multi-GB log could OOM the process even with the regex DoS guards in place. Added a per-file \`FALLBACK_GREP_MAX_FILE_BYTES = 5 MiB\` cap (stat first, skip with sentinel if oversize) plus a deadline re-check after each read. Hosts needing larger files should install ripgrep. 830 tests passing across all suites (was 817), lint baseline unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add support for the
responseModalitiesparameter in Google and VertexAI LLM classes to enable native Gemini image generation models (gemini-2.5-flash-image,gemini-3-pro-image-preview, etc.) to return images alongside text.Changes
responseModalities?: ('TEXT' | 'IMAGE' | 'AUDIO')[]toGoogleClientOptionsandVertexAIClientOptionstypesresponseModalitiestogenerationConfiginCustomChatGoogleGenerativeAIconstructorinlineData(image) parts in response processing (convertResponseContentToChatGenerationChunkandmapGenerateContentResultToChatResult), converting them toimage_urlcontent blocks with base64 data URLsresponseModalitiesto generation config in VertexAICustomChatConnection.formatDataUsage
Tested Models
gemini-2.5-flash-image- Returns text + PNG imagegemini-3-pro-image-preview- Returns JPEG imageRelated