Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for audio input in ChatMessageContentPart #292

Open
1 task done
andhesky opened this issue Nov 15, 2024 · 1 comment
Open
1 task done

Support for audio input in ChatMessageContentPart #292

andhesky opened this issue Nov 15, 2024 · 1 comment
Assignees

Comments

@andhesky
Copy link

Confirm this is a feature request for the .NET library and not the underlying OpenAI API

  • This is a feature request for the .NET library

Describe the feature or improvement you are requesting

With the announcement of support for input_audio as a content type for the chat completions API, it would be great to add support in C# to include audio in the ChatMessageContentPart. The realtime API is great, but some applications are better suited to the request/response nature of chat completion. However, without support in ChatMessageContentPart, I don't see how I can do so with the C# SDK.

Thanks.

Additional context

No response

@joseharriaga
Copy link
Collaborator

Thank you for reaching out, @andhesky ! We are working to add audio support in chat completions soon. In the meantime, if you need a quick solution to get unblocked, you could try using the ChatClient's CompleteChat protocol method as shown below. I will report back here as soon as we have proper support available.

ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));

// Convert the input audio .wav file to a base64-encoded string.
string inputAudioFilePath = Path.Combine("Assets", "audio_user_message.wav");
using Stream inputAudioStream = File.OpenRead(inputAudioFilePath);
BinaryData inputAudioBytes = BinaryData.FromStream(inputAudioStream);
string base64EncodedInputAudioData = Convert.ToBase64String(inputAudioBytes.ToArray());

// Create and send the chat completion request.
BinaryData input = BinaryData.FromString($$"""
    {
        "model": "gpt-4o-audio-preview",
        "modalities": ["text", "audio"],
        "audio": {
            "voice": "ash",
            "format": "wav"
        },
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "{{base64EncodedInputAudioData}}",
                            "format": "wav"
                        }
                    }
                ]
            }
        ]
    }
    """);
using BinaryContent content = BinaryContent.Create(input);
ClientResult result = await client.CompleteChatAsync(content);
BinaryData output = result.GetRawResponse().Content;

// Parse the JSON response.
using JsonDocument outputAsJson = JsonDocument.Parse(output.ToString());
string data = outputAsJson.RootElement
    .GetProperty("choices"u8)[0]
    .GetProperty("message"u8)
    .GetProperty("audio"u8)
    .GetProperty("data"u8)
    .GetString();

// Save the output audio to a .wav file.
BinaryData outputAudioBytes = BinaryData.FromBytes(Convert.FromBase64String(data));
using FileStream outputAudioStream = File.OpenWrite($"{Guid.NewGuid()}.wav");
outputAudioBytes.ToStream().CopyTo(outputAudioStream);

@joseharriaga joseharriaga self-assigned this Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants