Human Robot Interface via Voice

RAI provides two ROS enabled agents for Speech to Speech communication.

Automatic Speech Recognition Agent

See examples/s2s/asr.py for an example usage.

The agent requires configuration of sounddevice and ros2 connectors as well as a required voice activity detection (eg. SileroVAD) and transcription model e.g. (LocalWhisper), as well as optionally additional models to decide if the transcription should start (e.g. OpenWakeWord).

The Agent publishes information on two topics:

/from_human: rai_interfaces/msg/HRIMessages - containing transcriptions of the recorded speech

/voice_commands: std_msgs/msg/String - containing control commands, to inform the consumer if speech is currently detected ({"data": "pause"}), was detected, and now it stopped ({"data": "play"}), and if speech was transcribed ({"data": "stop"}).

The Agent utilises sounddevice module to access user's microphone, by default the "default" sound device is used. To get information about available sounddevices use:

python -c "import sounddevice; print(sounddevice.query_devices())"

The device can be identifed by name and passed to the configuration.

TextToSpeechAgent

See examples/s2s/tts.py for an example usage.

The agent requires configuration of sounddevice and ros2 connectors as well as a required TextToSpeech model (e.g. OpenTTS). The Agent listens for information on two topics:

/to_human: rai_interfaces/msg/HRIMessages - containing responses to be played to human. These responses are then transcribed and put into the playback queue.

/voice_commands: std_msgs/msg/String - containing control commands, to pause current playback ({"data": "pause"}), start/continue playback ({"data": "play"}), or stop the playback and drop the current playback queue ({"data": "play"}).

The Agent utilises sounddevice module to access user's speaker, by default the "default" sound device is used. To get a list of names of available sound devices use:

python -c 'import sounddevice as sd; print([x["name"] for x in list(sd.query_devices())])'

The device can be identifed by name and passed to the configuration.

OpenTTS

To run OpenTTS (and the example) a docker server containing the model must be running.

To start it run:

docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak

Running example

To run the provided example of S2S configuration with a minimal LLM-based agent run in 4 separate terminals:

$ docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak
$ python ./examples/s2s/asr.py
$ python ./examples/s2s/tts.py
$ python ./examples/s2s/conversational.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voice_interface.md

voice_interface.md

Human Robot Interface via Voice

Automatic Speech Recognition Agent

TextToSpeechAgent

OpenTTS

Running example

Files

voice_interface.md

Latest commit

History

voice_interface.md

File metadata and controls

Human Robot Interface via Voice

Automatic Speech Recognition Agent

TextToSpeechAgent

OpenTTS

Running example