feat(tts): implement Gemini TTS API#1879
Conversation
qbc2016
left a comment
There was a problem hiding this comment.
Thank you for the pr. Please see the inline comments.
| @@ -0,0 +1,8 @@ | |||
| # -*- coding: utf-8 -*- | |||
There was a problem hiding this comment.
Same as the OpenAI PR — GeminiTTSModel needs to be registered in src/agentscope/credential/_gemini.py:
Without this, GeminiCredential.list_tts_models() returns an empty list.
| `None`): | ||
| The TTS parameters (voice, etc.). When ``None``, the default | ||
| parameters will be used. | ||
| stream (`bool`, defaults to `False`): |
There was a problem hiding this comment.
Gemini TTS API does support streaming output. Please refer to https://ai.google.dev/gemini-api/docs/speech-generation#streaming
| inline_data = getattr(part, "inline_data", None) | ||
| if inline_data and inline_data.data: | ||
| data = inline_data.data | ||
| if isinstance(data, str): |
There was a problem hiding this comment.
The isinstance(data, str) check for inline_data.data is good defensive coding, but a brief comment explaining why both str and bytes are possible (SDK version differences?) would be helpful for maintainability.
| _DEFAULT_MEDIA_TYPE = "audio/wav" | ||
|
|
||
|
|
||
| def _extract_usage( |
There was a problem hiding this comment.
Other TTS implementations use _parse_usage for the equivalent function. Using _extract_usage here is fine, but for consistency across the TTS module, consider renaming to _parse_usage.
PR Title Format
feat(tts): implement Gemini TTS API
AgentScope Version
2.0.1
Description
This PR adds a Gemini TTS implementation to the new TTS module, following the existing
DashScopeTTSModelas a reference.Changes made:
GeminiTTSModelundersrc/agentscope/tts/_gemini/, subclassingTTSModelBaseand reusing the existingGeminiCredential.generateContentAPI withresponseModalities: ["AUDIO"], using the same lazy-importedgoogle.genaiclient pattern asGeminiChatModel.DataBlockwithmedia_type="audio/wav".gemini-2.5-flash-preview-ttsandgemini-2.5-pro-preview-tts, including the full list of supported prebuilt voices.GeminiTTSModelfromagentscope.tts.tests/tts_gemini_test.pymirroringtests/tts_dashscope_test.py, skipping cleanly ifgoogle-genaiis not installed.How to test:
Closes #1680
Checklist