This project demonstrates the integration of D-ID's and OpenAI's Whisper within a NextJS application. It allows users to interact with AI-powered avatars using text input or voice commands, showcasing the potential of conversational AI interfaces.
- Interactive avatar selection
- Text-to-speech functionality
- Voice input using OpenAI's Whisper for transcription
- Integration with OpenAI's GPT model for conversational responses
- Node.js (v14 or later)
- npm (v6 or later)
- A D-ID API key
- An OpenAI API key
-
Clone this repository:
git clone https://github.com/WillKre/nextjs-whisper-d-id.git cd nextjs-whisper-d-id
-
Install dependencies:
npm install
-
Set up environment variables: Run
npm run setup-env
or create a.env
file in the root directory and add the following:D_ID_API_KEY=your_d_id_api_key OPENAI_API_KEY=your_openai_api_key NEXT_PUBLIC_OPENAI_API_KEY=your_openai_api_key
Then populate the values:
- To obtain a D-ID API key, sign up at D-ID's website and navigate to the API section in your account settings.
- For an OpenAI API key, create an account at OpenAI's website and generate an API key in your account dashboard.
-
Start the development server:
npm run dev
-
Open http://localhost:3000 in your browser to see the application.
- Select an avatar from the dropdown menu.
- Choose a voice for the avatar.
- Type text in the "Repeat" input to make the avatar speak that text.
- Use the "Chat" input to have a conversation with the AI-powered avatar:
- Type your message and press enter, or
- Click the microphone icon to use voice input (requires browser permission).
The project is set up to use ElevenLabs, a leading voice service, for generating realistic voices. The available voices are fetched dynamically from the ElevenLabs API.
If you want to use voices from other providers or add more options:
-
Update the API endpoint in
app/api/voices/route.ts
:const response = await fetch("https://api.elevenlabs.io/v1/voices", { // ... existing headers ... });
Replace the URL with the endpoint of your chosen voice provider.
-
Ensure that the response is mapped to match the expected format:
const transformedVoices = data.voices.map((voice) => ({ voice_id: voice.voice_id, name: voice.name, }));
app/
: Contains the main application code and API routes.components/
: React components used throughout the application.utils/
: Utility functions and helper modules.types/
: Common TypeScript types which are used across the project.styles/
: Global styles and Tailwind CSS configuration.
- Next.js: React framework for building the application
- D-ID API: For generating interactive avatars
- OpenAI API: For GPT-based conversation and Whisper transcription
- NextUI: UI component library
- Tailwind CSS: For styling
Contributions are welcome! Please feel free to submit a Pull Request.