feat(tts): implement Gemini TTS API by nuthalapativarun · Pull Request #1879 · agentscope-ai/agentscope

nuthalapativarun · 2026-06-15T15:53:20Z

PR Title Format

feat(tts): implement Gemini TTS API

AgentScope Version

2.0.1

Description

This PR adds a Gemini TTS implementation to the new TTS module, following the existing DashScopeTTSModel as a reference.

Changes made:

Added GeminiTTSModel under src/agentscope/tts/_gemini/, subclassing TTSModelBase and reusing the existing GeminiCredential.
The model calls the Gemini generateContent API with responseModalities: ["AUDIO"], using the same lazy-imported google.genai client pattern as GeminiChatModel.
Audio is returned as raw 24kHz/mono/16-bit PCM, wrapped into a self-contained WAV DataBlock with media_type="audio/wav".
Non-streaming/non-realtime only, as called out as acceptable in the issue.
Added model card YAMLs for gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts, including the full list of supported prebuilt voices.
Exported GeminiTTSModel from agentscope.tts.
Added tests/tts_gemini_test.py mirroring tests/tts_dashscope_test.py, skipping cleanly if google-genai is not installed.

How to test:

pytest tests/tts_gemini_test.py -v
pytest tests -k tts

Closes #1680

Checklist

An issue has been created for this PR
I have read CONTRIBUTING.md
Docstrings are in Google style
Related documentation has been updated (agentscope-ai/docs)
Code is ready for review

qbc2016

Thank you for the pr. Please see the inline comments.

qbc2016 · 2026-06-17T08:40:16Z

@@ -0,0 +1,8 @@
+# -*- coding: utf-8 -*-


Same as the OpenAI PR — GeminiTTSModel needs to be registered in src/agentscope/credential/_gemini.py:
Without this, GeminiCredential.list_tts_models() returns an empty list.

qbc2016 · 2026-06-17T08:46:59Z

+            `None`):
+                The TTS parameters (voice, etc.). When ``None``, the default
+                parameters will be used.
+            stream (`bool`, defaults to `False`):


Gemini TTS API does support streaming output. Please refer to https://ai.google.dev/gemini-api/docs/speech-generation#streaming

qbc2016 · 2026-06-17T08:50:16Z

+                inline_data = getattr(part, "inline_data", None)
+                if inline_data and inline_data.data:
+                    data = inline_data.data
+                    if isinstance(data, str):


The isinstance(data, str) check for inline_data.data is good defensive coding, but a brief comment explaining why both str and bytes are possible (SDK version differences?) would be helpful for maintainability.

qbc2016 · 2026-06-17T08:50:57Z

+_DEFAULT_MEDIA_TYPE = "audio/wav"
+
+
+def _extract_usage(


Other TTS implementations use _parse_usage for the equivalent function. Using _extract_usage here is fine, but for consistency across the TTS module, consider renaming to _parse_usage.

feat(tts): implement Gemini TTS API

a658a1a

qbc2016 reviewed Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): implement Gemini TTS API#1879

feat(tts): implement Gemini TTS API#1879
nuthalapativarun wants to merge 1 commit into
agentscope-ai:mainfrom
nuthalapativarun:feat/1680-gemini-tts-model

nuthalapativarun commented Jun 15, 2026

Uh oh!

qbc2016 left a comment

Uh oh!

qbc2016 Jun 17, 2026

Uh oh!

qbc2016 Jun 17, 2026

Uh oh!

qbc2016 Jun 17, 2026

Uh oh!

qbc2016 Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_DEFAULT_MEDIA_TYPE = "audio/wav"


		def _extract_usage(

Conversation

nuthalapativarun commented Jun 15, 2026

PR Title Format

AgentScope Version

Description

Checklist

Uh oh!

qbc2016 left a comment

Choose a reason for hiding this comment

Uh oh!

qbc2016 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

qbc2016 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

qbc2016 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

qbc2016 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants