RAI provides two ROS enabled agents for Speech to Speech communication.
See examples/s2s/asr.py
for an example usage.
The agent requires configuration of sounddevice
and ros2
connectors as well as a required voice activity detection (eg. SileroVAD
) and transcription model e.g. (LocalWhisper
), as well as optionally additional models to decide if the transcription should start (e.g. OpenWakeWord
).
The Agent publishes information on two topics:
/from_human
: rai_interfaces/msg/HRIMessages
- containing transcriptions of the recorded speech
/voice_commands
: std_msgs/msg/String
- containing control commands, to inform the consumer if speech is currently detected ({"data": "pause"}
), was detected, and now it stopped ({"data": "play"}
), and if speech was transcribed ({"data": "stop"}
).
The Agent utilises sounddevice module to access user's microphone, by default the "default"
sound device is used.
To get information about available sounddevices use:
python -c "import sounddevice; print(sounddevice.query_devices())"
The device can be identifed by name and passed to the configuration.
See examples/s2s/tts.py
for an example usage.
The agent requires configuration of sounddevice
and ros2
connectors as well as a required TextToSpeech model (e.g. OpenTTS
).
The Agent listens for information on two topics:
/to_human
: rai_interfaces/msg/HRIMessages
- containing responses to be played to human. These responses are then transcribed and put into the playback queue.
/voice_commands
: std_msgs/msg/String
- containing control commands, to pause current playback ({"data": "pause"}
), start/continue playback ({"data": "play"}
), or stop the playback and drop the current playback queue ({"data": "play"}
).
The Agent utilises sounddevice module to access user's speaker, by default the "default"
sound device is used.
To get a list of names of available sound devices use:
python -c 'import sounddevice as sd; print([x["name"] for x in list(sd.query_devices())])'
The device can be identifed by name and passed to the configuration.
To run OpenTTS (and the example) a docker server containing the model must be running.
To start it run:
docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak
To run the provided example of S2S configuration with a minimal LLM-based agent run in 4 separate terminals:
$ docker run -it -p 5500:5500 synesthesiam/opentts:en --no-espeak
$ python ./examples/s2s/asr.py
$ python ./examples/s2s/tts.py
$ python ./examples/s2s/conversational.py