Support for audio input in ChatMessageContentPart #292

andhesky · 2024-11-15T22:53:45Z

Confirm this is a feature request for the .NET library and not the underlying OpenAI API

This is a feature request for the .NET library

Describe the feature or improvement you are requesting

With the announcement of support for input_audio as a content type for the chat completions API, it would be great to add support in C# to include audio in the ChatMessageContentPart. The realtime API is great, but some applications are better suited to the request/response nature of chat completion. However, without support in ChatMessageContentPart, I don't see how I can do so with the C# SDK.

Thanks.

Additional context

No response

joseharriaga · 2024-12-14T00:47:21Z

Thank you for reaching out, @andhesky ! We are working to add audio support in chat completions soon. In the meantime, if you need a quick solution to get unblocked, you could try using the ChatClient's CompleteChat protocol method as shown below. I will report back here as soon as we have proper support available.

ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));

// Convert the input audio .wav file to a base64-encoded string.
string inputAudioFilePath = Path.Combine("Assets", "audio_user_message.wav");
using Stream inputAudioStream = File.OpenRead(inputAudioFilePath);
BinaryData inputAudioBytes = BinaryData.FromStream(inputAudioStream);
string base64EncodedInputAudioData = Convert.ToBase64String(inputAudioBytes.ToArray());

// Create and send the chat completion request.
BinaryData input = BinaryData.FromString($$"""
    {
        "model": "gpt-4o-audio-preview",
        "modalities": ["text", "audio"],
        "audio": {
            "voice": "ash",
            "format": "wav"
        },
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "{{base64EncodedInputAudioData}}",
                            "format": "wav"
                        }
                    }
                ]
            }
        ]
    }
    """);
using BinaryContent content = BinaryContent.Create(input);
ClientResult result = await client.CompleteChatAsync(content);
BinaryData output = result.GetRawResponse().Content;

// Parse the JSON response.
using JsonDocument outputAsJson = JsonDocument.Parse(output.ToString());
string data = outputAsJson.RootElement
    .GetProperty("choices"u8)[0]
    .GetProperty("message"u8)
    .GetProperty("audio"u8)
    .GetProperty("data"u8)
    .GetString();

// Save the output audio to a .wav file.
BinaryData outputAudioBytes = BinaryData.FromBytes(Convert.FromBase64String(data));
using FileStream outputAudioStream = File.OpenWrite($"{Guid.NewGuid()}.wav");
outputAudioBytes.ToStream().CopyTo(outputAudioStream);

joseharriaga self-assigned this Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for audio input in ChatMessageContentPart #292

Support for audio input in ChatMessageContentPart #292

andhesky commented Nov 15, 2024

joseharriaga commented Dec 14, 2024

Support for audio input in ChatMessageContentPart #292

Support for audio input in ChatMessageContentPart #292

Comments

andhesky commented Nov 15, 2024

Confirm this is a feature request for the .NET library and not the underlying OpenAI API

Describe the feature or improvement you are requesting

Additional context

joseharriaga commented Dec 14, 2024