This is a proof of concept demonstrating how you can build a voice-based language tutor using Telegram, ChatGPT, and Google's free speech recognition and text-to-speech libraries. I've seen a few demos highlighting text-based conversations with a LLM language tutor but felt like the experience could be greatly improved if you could actually practice speaking with your LLM tutor. I've been studying Mandarin for a bit and decided I'd try this out as a potential learning tool.
Here's a quick demo where I'm having a basic conversation with the ChatGPT-powered tutor in Mandarin.
demo.MP4
While it's not a substitute for real speaking practice, I find the dynamic nature of the LLM conversations fascinating. The topics tend to vary significantly (particularly if you play with ChatGPT's temperature setting) and often feel very similar to real speaking practice, particularly since you can switch into English if you don't understand a topic (see the 0:25 mark where I requested the tutor repeat the question in English).
pip install -r requirements.txt
(Python 3.6+)- Create a Telegram bot and generate an API key
- Create an OpenAI Platform account and generate an API key
- Populate an
.env
file (or configure environment variables) with your API keys for Telegram and ChatGPT - If you wish to speak with the bot in Mandarin, you're done! Just run
python3 app.py
and start chatting with the bot using the Telegram app.
Yes absolutely, there are just a few things you'll need to change:
- Update the
base_prompt
in chatgpt_agent.py - Update the language parameter in
convert_voice_to_text()
to a supported language:
r = sr.Recognizer()
with sr.AudioFile("voice_message.wav") as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data, language="zh-CN")
print(f"Converted audio to the following text: {text}")
You can find a full list of supported languages for Google's Speech Recognition APIs here
- Update the
lang
parameter in the Google Text to Speech (gTTS) library used byresponse_text_to_audio()
:
# Use Google Text-to-Speech to convert the text to speech
tts = gTTS(text, lang="zh")
tts.save("voice_message.mp3")
Note: The easiest way to get a list of available gTTS languages is to print them with gtts-cli --all
- Add support for multiple languages (ultimately a rather simple mapping exercise across supported languages in the Speech Recognition & TTS libraries)
- Replace gTTS library with a more realistic sounding alternative. gTTS is a fantastic free service but I think I can find fairly inexpensive alternatives that would provide a much more realistic sounding voice for the Mandarin tutor.
- Build conversation caching so the bot can recall recent conversations and hold conversations with multiple users at once