Skip to content

sajithamma/openai-realtime-nodejs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-time Audio Interaction with OpenAI WebSocket API

This sample code real-time audio interaction using OpenAI's WebSocket API for GPT-4o's real-time audio streaming preview. The system sends an input audio file to the OpenAI server and plays the audio response in real-time using Node.js.

Overview

  • Input Audio: The system reads an input audio file (e.g., gettysburg.wav), encodes it into base64 PCM16 format, and sends it to the OpenAI server.
  • Real-time Response: The OpenAI server responds with real-time audio chunks, which are played directly from memory using the speaker library.
  • Real-time Playback: The audio is played as it is received without needing to save the audio to a file.

WebSocket Connection and Audio Setup

First, configure the WebSocket connection to OpenAI's API and set up the speaker to play the real-time audio response.

const url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-10-01";
const ws = new WebSocket(url, {
    headers: {
        "Authorization": "Bearer " + process.env.OPENAI_API_KEY,
        "OpenAI-Beta": "realtime=v1",
    },
});

Setting up the live playback

The speaker library is used to play the real-time audio response. The audio chunks received from the OpenAI server are appended to the speaker buffer for live playback.

const speaker = new Speaker({
    channels: numChannels,          // 1 channel (mono)
    bitDepth: 16,                   // 16-bit samples
    sampleRate: sampleRate          // 24,000 Hz sample rate
});

On-Message Event Handler

The onmessage event handler is used to process the WebSocket messages received from the OpenAI server. The audio chunks are appended to the speaker buffer for real-time playback.

ws.on("message", function incoming(message) {
    const parsedMessage = JSON.parse(message.toString());
    console.log("Message received from server:", parsedMessage);

    // Handle audio delta events
    if (parsedMessage.type === 'response.audio.delta' && parsedMessage.delta) {
        const base64Audio = parsedMessage.delta;
        appendBase64AudioToSpeaker(base64Audio); // Play audio chunk in real-time
    }

    // Handle the response.audio.done event
    if (parsedMessage.type === 'response.audio.done') {
        console.log("Audio generation done.");
        speaker.end(); // End the speaker stream
    }

    // If the message contains content, print it in detail
    if (parsedMessage.item && parsedMessage.item.content) {
        console.log("Message content:", JSON.stringify(parsedMessage.item.content, null, 2));
    }
});

Prerequisites

  • Node.js (v14+)
  • OpenAI API Key
  • An audio file in .wav format (mono, 16-bit PCM, sampled at 24,000 Hz)

Installation

  1. Clone the repository or download the project files:

    git clone [email protected]:sajithamma/openai-realtime-nodejs.git
    cd openai-realtime-nodejs
  2. Install the dependencies:

    npm install

    This will install the following dependencies:

    • dotenv: To manage environment variables.
    • speaker: To play real-time audio from PCM16 data.
    • audio-decode: To decode the input audio file.
    • ws: WebSocket client for OpenAI's WebSocket API.
  3. Add your OpenAI API key to a .env file:

    touch .env

    Add the following line to the .env file:

    OPENAI_API_KEY=your-openai-api-key-here

Running the Project

  1. Place your input .wav audio file in the project directory. Ensure the file is mono, 16-bit PCM, sampled at 24,000 Hz. For example, gettysburg.wav.

  2. Run the project:

    node app.mjs
  3. The program will:

    • Read and encode the input audio file.
    • Send the input audio to OpenAI’s WebSocket API.
    • Play the response audio in real-time using your system's speaker.

Project Structure

.
├── app.mjs               # Main entry point of the app
├── package.json         # Project dependencies and scripts
├── .env                 # OpenAI API key (not included in the repo)
└── gettysburg.wav       # Input audio file (add your own)

Expected Output

  1. Console Output: As the program runs, you will see WebSocket events being printed in the console. Example:

    Connected to server.
    Message received from server: { type: 'response.audio.delta', ... }
    Appending 12345 bytes to speaker...
    Message received from server: { type: 'response.audio.done', ... }
    Audio generation done.
    
  2. Real-time Audio Playback: You will hear the real-time response generated by the OpenAI server through your speakers as audio chunks are received.

Troubleshooting

  1. Slow Audio: If the audio sounds slowed down, ensure that the sample rate of the input file is 24,000 Hz. The response audio is played assuming this rate.
  2. No Audio: Make sure your system's audio is working and the correct audio device is selected. Also, ensure the input audio file is in the correct format.

About

OpenAI realtime API using NodeJS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published