An experimental text editor showcasing Gemini 2.0's Native Audio capabilities. Built on top of Novel, Voice Cursor demonstrates how Gemini's new text-to-speech API can be integrated into a text editor for fluid, in context voice generation.
Gemini 2.0 introduces multilingual native audio output - a powerful new capability that lets developers generate natural-sounding speech directly from the Gemini API. This project demonstrates how to use this feature in a real application.
🎥 Watch the Gemini 2.0 Native Audio Demo 🔊
- 🎯 Native Gemini Audio: Direct integration with Gemini 2.0's text-to-speech capabilities
- 🎭 Rich Voice Options: 8 different Gemini voices with distinct characteristics
- 😊 Emotional Control: 15 different tones to shape how Gemini expresses the text
- 🎨 Visual Integration: Color-coded highlights show which voice and tone were used
- ⚡ Instant Generation: Quick audio synthesis powered by Gemini's latest model
git clone https://github.com/googlecreativelab/gemini-demos/voice-cursor
npm install
Get your API key from Google AI Studio
NEXT_PUBLIC_GEMINI_API_KEY=your_api_key_here
npm run dev
Open http://localhost:3000 and start highlighting text!
The magic happens in src/components/editor/selectors/voice-popover.tsx
. When text is highlighted, we construct a prompt that includes both the text and desired emotional tone:
This is then sent to Gemini 2.0's API with audio generation enabled.
The voice cursor supports various emotional tones through the src/lib/tone-options.ts
file. Each tone has an emoji and a transformation function that constructs the prompt:
Edit, add, or remove tones in src/lib/tone-options.ts
:
export const TONE_OPTIONS: ToneOption[] = [
// How are you feeling?
// --> Prompt transformation -->
// Say rapidly and energetically: "How-are-you-feeling?"
{
emoji: "🐰",
name: "Fast",
transform: (text) => `Say rapidly and energetically: "${text.split(' ').join('-')}"`
},
];
Then that tone is used in the src/components/editor/selectors/voice-popover.tsx
file where we make a request to Gemini 2.0 Native Audio:
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=${process.env.NEXT_PUBLIC_GEMINI_API_KEY}`,
{
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
contents: [{
parts: [{ text: textToSpeak }]
}],
generationConfig: {
response_modalities: ["AUDIO"],
speech_config: {
voice_config: {
prebuilt_voice_config: {
voice_name: voice
}
}
}
}
})
}
);
You can experiment with Gemini 2.0's in AI Studio:
- Visit AI Studio
- Select "Gemini 2.0 Flash Experimental" model
- Set output format to "Audio"
- Enter your prompt
- Click "Generate"
- Built with Novel, a Notion-style WYSIWYG editor
- Powered by Google's Gemini 2.0 Native Audio
- Code from Trudy Painter, @trudypainter
- Design from Jose Guizar
This is an experiment showcasing Gemini 2.0's Native Audio capabilities, not an official Google product. We'll do our best to support and maintain this experiment but your mileage may vary. We encourage open sourcing projects as a way of learning from each other. Please respect our and other creators' rights, including copyright and trademark rights when present, when sharing these works and creating derivative work. If you want more info on Google's policy, you can find that here.
Licensed under the Apache-2.0 license.