Skip to content

feat: add voice input mode via /speak #602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ercbot
Copy link

@ercbot ercbot commented Apr 23, 2025

Resolves #418

Adds a /speak command in the CLUI which triggers a transcription based input to the text box

Uses the OpenAI Realtime API for transcription of voice input

When /speak is invoked:

  • A red recording indicator appears at the front of the input box
  • Audio is streamed to the OpenAI api and a VAD detects when you start speaking, once it detects you have finished speaking the transcription is streamed from the transcription model to the input box.
  • Pressing enter sends the recorded text, and pauses any transcription until the input box opens again
  • Pressing any other key automatically stops the recording so you can make edits (useful if you're being more precise and editing your responses or in a noisy environment for instant)

The default language is 'en' and the default model is 'gpt-4o-mini-transcribe'

I was originally going to add language config to the command itself but it might be better suited for the ~/.codex/ config file so I am looking for feedback on the best place to add config for model + language.

Tested and working Macbook Air M4 with built in microphone

Copy link

github-actions bot commented Apr 23, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@ercbot
Copy link
Author

ercbot commented Apr 23, 2025

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Apr 23, 2025
@ercbot ercbot force-pushed the feat/voice-input branch from 223d37b to 86b2426 Compare April 26, 2025 18:38
@ercbot ercbot force-pushed the feat/voice-input branch from 86b2426 to a87c617 Compare April 26, 2025 18:42
@ercbot
Copy link
Author

ercbot commented Apr 26, 2025

I addressed all feedback from the previous review and added tests for both the transcriber.ts module and the terminal-chat-input.tsx component to cover the new transcription functionality.

I also added config settings to customize the transcription model and language through the existing config system. Details for these options are available in the OpenAI docs.

Functionally, everything is working as intended. Let me know if there's anything else needed to complete this PR.

@ercbot
Copy link
Author

ercbot commented Apr 26, 2025

Example of the .codex/config.yaml file:

# existing config options
model: 4o-mini
approvalMode: suggest 
# ...
# new transcription options (these are the defaults found in src/utils/transcriber.ts)
transcription:
  input_audio_transcription:
    model: gpt-4o-transcribe
    prompt: ""
    language: "en"
  turn_detection:
    type: server_vad
    threshold: 0.6
    prefix_padding_ms: 400
    silence_duration_ms: 500
  input_audio_noise_reduction:
    type: near_field

@ercbot ercbot requested a review from benny123tw April 26, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Add a voice input mode
2 participants