feat: add voice input mode via /speak #602

ercbot · 2025-04-23T23:02:31Z

Resolves #418

Adds a /speak command in the CLUI which triggers a transcription based input to the text box

Uses the OpenAI Realtime API for transcription of voice input

When /speak is invoked:

A red ● recording indicator appears at the front of the input box
Audio is streamed to the OpenAI api and a VAD detects when you start speaking, once it detects you have finished speaking the transcription is streamed from the transcription model to the input box.
Pressing enter sends the recorded text, and pauses any transcription until the input box opens again
Pressing any other key automatically stops the recording so you can make edits (useful if you're being more precise and editing your responses or in a noisy environment for instant)

The default language is 'en' and the default model is 'gpt-4o-mini-transcribe'

I was originally going to add language config to the command itself but it might be better suited for the ~/.codex/ config file so I am looking for feedback on the best place to add config for model + language.

Tested and working Macbook Air M4 with built in microphone

github-actions · 2025-04-23T23:02:43Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

ercbot · 2025-04-23T23:04:30Z

I have read the CLA Document and I hereby sign the CLA

codex-cli/src/components/chat/terminal-chat-input.tsx

codex-cli/src/utils/transcriber.ts

ercbot · 2025-04-26T18:53:07Z

I addressed all feedback from the previous review and added tests for both the transcriber.ts module and the terminal-chat-input.tsx component to cover the new transcription functionality.

I also added config settings to customize the transcription model and language through the existing config system. Details for these options are available in the OpenAI docs.

Functionally, everything is working as intended. Let me know if there's anything else needed to complete this PR.

ercbot · 2025-04-26T19:01:58Z

Example of the .codex/config.yaml file:

# existing config options
model: 4o-mini
approvalMode: suggest 
# ...
# new transcription options (these are the defaults found in src/utils/transcriber.ts)
transcription:
  input_audio_transcription:
    model: gpt-4o-transcribe
    prompt: ""
    language: "en"
  turn_detection:
    type: server_vad
    threshold: 0.6
    prefix_padding_ms: 400
    silence_duration_ms: 500
  input_audio_noise_reduction:
    type: near_field

github-actions bot added a commit that referenced this pull request Apr 23, 2025

@ercbot has signed the CLA in #602

aa43fe7

benny123tw reviewed Apr 24, 2025

View reviewed changes

ercbot force-pushed the feat/voice-input branch from 223d37b to 86b2426 Compare April 26, 2025 18:38

ercbot added 11 commits April 26, 2025 14:41

add speak trigger

214ba8a

recording + transcription functionality

2be40d4

no longer records while the input box is closed and codex is thinking

94c45b7

pnpm format

923439b

pnpm format

317143f

fix: improper unused variable name for useInput

217c80a

added headers to realtime request and using getBaseUrl and getApiKey

46b2973

pnpm format + lint

862ef2f

tests for transcription functionality

d960386

adding transcription config + tests

c96b72d

hnadling for error during transcription

a87c617

ercbot force-pushed the feat/voice-input branch from 86b2426 to a87c617 Compare April 26, 2025 18:42

ercbot requested a review from benny123tw April 26, 2025 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add voice input mode via /speak #602

feat: add voice input mode via /speak #602

ercbot commented Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

ercbot commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ercbot commented Apr 26, 2025

Uh oh!

ercbot commented Apr 26, 2025

Uh oh!

Uh oh!

feat: add voice input mode via /speak #602

Are you sure you want to change the base?

feat: add voice input mode via /speak #602

Conversation

ercbot commented Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ercbot commented Apr 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ercbot commented Apr 26, 2025

Uh oh!

ercbot commented Apr 26, 2025

Uh oh!

Uh oh!

github-actions bot commented Apr 23, 2025 •

edited

Loading