This sample code real-time audio interaction using OpenAI's WebSocket API for GPT-4o's real-time audio streaming preview. The system sends an input audio file to the OpenAI server and plays the audio response in real-time using Node.js.
- Input Audio: The system reads an input audio file (e.g.,
gettysburg.wav
), encodes it into base64 PCM16 format, and sends it to the OpenAI server. - Real-time Response: The OpenAI server responds with real-time audio chunks, which are played directly from memory using the
speaker
library. - Real-time Playback: The audio is played as it is received without needing to save the audio to a file.
First, configure the WebSocket connection to OpenAI's API and set up the speaker to play the real-time audio response.
const url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01";
const ws = new WebSocket(url, {
headers: {
"Authorization": "Bearer " + process.env.OPENAI_API_KEY,
"OpenAI-Beta": "realtime=v1",
},
});
The speaker
library is used to play the real-time audio response. The audio chunks received from the OpenAI server are appended to the speaker buffer for live playback.
const speaker = new Speaker({
channels: numChannels, // 1 channel (mono)
bitDepth: 16, // 16-bit samples
sampleRate: sampleRate // 24,000 Hz sample rate
});
The onmessage
event handler is used to process the WebSocket messages received from the OpenAI server. The audio chunks are appended to the speaker buffer for real-time playback.
ws.on("message", function incoming(message) {
const parsedMessage = JSON.parse(message.toString());
console.log("Message received from server:", parsedMessage);
// Handle audio delta events
if (parsedMessage.type === 'response.audio.delta' && parsedMessage.delta) {
const base64Audio = parsedMessage.delta;
appendBase64AudioToSpeaker(base64Audio); // Play audio chunk in real-time
}
// Handle the response.audio.done event
if (parsedMessage.type === 'response.audio.done') {
console.log("Audio generation done.");
speaker.end(); // End the speaker stream
}
// If the message contains content, print it in detail
if (parsedMessage.item && parsedMessage.item.content) {
console.log("Message content:", JSON.stringify(parsedMessage.item.content, null, 2));
}
});
- Node.js (v14+)
- OpenAI API Key
- An audio file in
.wav
format (mono, 16-bit PCM, sampled at 24,000 Hz)
-
Clone the repository or download the project files:
git clone [email protected]:sajithamma/openai-realtime-nodejs.git cd openai-realtime-nodejs
-
Install the dependencies:
npm install
This will install the following dependencies:
dotenv
: To manage environment variables.speaker
: To play real-time audio from PCM16 data.audio-decode
: To decode the input audio file.ws
: WebSocket client for OpenAI's WebSocket API.
-
Add your OpenAI API key to a
.env
file:touch .env
Add the following line to the
.env
file:OPENAI_API_KEY=your-openai-api-key-here
-
Place your input
.wav
audio file in the project directory. Ensure the file is mono, 16-bit PCM, sampled at 24,000 Hz. For example,gettysburg.wav
. -
Run the project:
node app.mjs
-
The program will:
- Read and encode the input audio file.
- Send the input audio to OpenAI’s WebSocket API.
- Play the response audio in real-time using your system's speaker.
.
├── app.mjs # Main entry point of the app
├── package.json # Project dependencies and scripts
├── .env # OpenAI API key (not included in the repo)
└── gettysburg.wav # Input audio file (add your own)
-
Console Output: As the program runs, you will see WebSocket events being printed in the console. Example:
Connected to server. Message received from server: { type: 'response.audio.delta', ... } Appending 12345 bytes to speaker... Message received from server: { type: 'response.audio.done', ... } Audio generation done.
-
Real-time Audio Playback: You will hear the real-time response generated by the OpenAI server through your speakers as audio chunks are received.
- Slow Audio: If the audio sounds slowed down, ensure that the sample rate of the input file is 24,000 Hz. The response audio is played assuming this rate.
- No Audio: Make sure your system's audio is working and the correct audio device is selected. Also, ensure the input audio file is in the correct format.