achatbot factory, create chat bots with llm(tools), asr, tts, vad, ocr, detect object etc..
πΏ Features
-
demo
-
podcast AI PodcastοΌhttps://weedge.us.kg/ :)
# need GOOGLE_API_KEY in environment variables # default use language English # websit python -m demo.content_parser_tts instruct-content-tts \ "https://en.wikipedia.org/wiki/Large_language_model" python -m demo.content_parser_tts instruct-content-tts \ --role-tts-voices zh-CN-YunjianNeural \ --role-tts-voices zh-CN-XiaoxiaoNeural \ --language zh \ "https://en.wikipedia.org/wiki/Large_language_model" # pdf # https://web.stanford.edu/~jurafsky/slp3/ed3bookaug20_2024.pdf 600 page is ok~ :) python -m demo.content_parser_tts instruct-content-tts \ "/Users/wuyong/Desktop/Speech and Language Processing.pdf" python -m demo.content_parser_tts instruct-content-tts \ --role-tts-voices zh-CN-YunjianNeural \ --role-tts-voices zh-CN-XiaoxiaoNeural \ --language zh \ "/Users/wuyong/Desktop/Speech and Language Processing.pdf"
-
-
cmd chat bots:
- local-terminal-chat(be/fe)
- remote-queue-chat(be/fe)
- grpc-terminal-chat(be/fe)
- grpc-speaker
- http fastapi_daily_bot_serve (with chat bots pipeline)
- bots with config see notebooks:
-
support transport connector:
- pipe(UNIX socket),
- grpc,
- queue (redis),
- websocket
- TCP/IP socket
-
chat bot processors:
- aggreators(llm use, assistant message),
- ai_frameworks
- langchain: RAG
- llamaindex: RAG
- autoagen: multi Agents
- realtime voice inference(RTVI),
- transport:
- webRTC: (daily,livekit KISS)
- Websocket server
- ai processor: llm, tts, asr etc..
- llm_processor:
- openai(use openai sdk)
- google gemini(use google-generativeai sdk)
- litellm(use openai input/output format proxy sdk)
- llm_processor:
-
core module:
- local llm:
- llama-cpp (support text,vision with function-call model)
- transformers(manual, pipeline) (support text; vision,vision+image; speech,voice; vision+voice)
- llm_transformers_manual_vision_llama
- llm_transformers_manual_vision_molmo
- llm_transformers_manual_vision_qwen
- llm_transformers_manual_vision_deepseek
- llm_transformers_manual_vision_janus_flow
- llm_transformers_manual_image_janus_flow
- llm_transformers_manual_vision_janus
- llm_transformers_manual_image_janus
- llm_transformers_manual_speech_llasa
- llm_transformers_manual_speech_step
- llm_transformers_manual_voice_glm
- llm_transformers_manual_vision_voice_minicpmo, llm_transformers_manual_voice_minicpmo,llm_transformers_manual_audio_minicpmo,llm_transformers_manual_text_speech_minicpmo,llm_transformers_manual_instruct_speech_minicpmo,llm_transformers_manual_vision_minicpmo
- remote api llm: personal-ai(like openai api, other ai provider)
- local llm:
-
AI modules:
- functions:
- search: search,search1,serper
- weather: openweathermap
- speech:
- asr:
- whisper_asr, whisper_timestamped_asr, whisper_faster_asr, whisper_transformers_asr, whisper_mlx_asr
- whisper_groq_asr
- sense_voice_asr
- minicpmo_asr (whisper)
- audio_stream: daily_room_audio_stream(in/out), pyaudio_stream(in/out)
- detector: porcupine_wakeword,pyannote_vad,webrtc_vad,silero_vad,webrtc_silero_vad,fsmn_vad
- player: stream_player
- recorder: rms_recorder, wakeword_rms_recorder, vad_recorder, wakeword_vad_recorder
- tts:
- tts_edge
- tts_g
- tts_coqui
- tts_chat
- tts_cosy_voice,tts_cosy_voice2
- tts_f5
- tts_openvoicev2
- tts_kokoro,tts_onnx_kokoro
- tts_fishspeech
- tts_llasa
- tts_minicpmo
- tts_zonos
- tts_step
- vad_analyzer:
- daily_webrtc_vad_analyzer
- silero_vad_analyzer
- asr:
- vision
- OCR(Optical Character Recognition):
- Detector:
- YOLO (You Only Look Once)
- RT-DETR v2 (RealTime End-to-End Object Detection with Transformers)
- functions:
-
gen modules config(*.yaml, local/test/prod) from env with file:
.env
u also use HfArgumentParser this module's args to local cmd parse args -
deploy to cloud βοΈ serverless:
- vercel (frontend ui pages)
- Cloudflare(frontend ui pages), personal ai workers
- fastapi-daily-chat-bot on cerebrium (provider aws)
- fastapi-daily-chat-bot on leptonai
- aws lambda + api Gateway
- docker -> k8s/k3s
- etc...
π» Service Deployment Architecture
- ui/web-client-ui deploy it to cloudflare page with vite, access https://chat-client-weedge.pages.dev/
- ui/educator-client deploy it to cloudflare page with vite, access https://educator-client.pages.dev/
- chat-bot-rtvi-web-sandbox use this web sandbox to test config, actions with DailyRTVIGeneralBot
- vite-react-rtvi-web-voice rtvi web voice chat bots, diff cctv roles etc, u can diy your own role by change the system prompt with DailyRTVIGeneralBot deploy it to cloudflare page with vite, access https://role-chat.pages.dev/
- vite-react-web-vision deploy it to cloudflare page with vite, access https://vision-weedge.pages.dev/
- nextjs-react-web-storytelling deploy it to cloudflare page worker with nextjs, access https://storytelling.pages.dev/
- websocket-demo: websocket audio chat bot demo
- deploy/modal(KISS) ππ»
- deploy/leptonai(KISS)ππ»
- deploy/cerebrium/fastapi-daily-chat-bot :)
- deploy/aws/fastapi-daily-chat-bot :|
- deploy/docker/fastapi-daily-chat-bot π
Note
python --version
>=3.10 with asyncio-task
if install achatbot[tts_openvoicev2]
need install melo-tts pip install git+https://github.com/myshell-ai/MeloTTS.git
Tip
use uv + pip to run, install the required dependencies fastly, e.g.:
uv pip install achatbot
uv pip install "achatbot[fastapi_bot_server]"
python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
pip install achatbot
# optional-dependencies e.g.
pip install "achatbot[fastapi_bot_server]"
git clone --recursive https://github.com/ai-bot-pro/chat-bot.git
cd chat-bot
python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
bash scripts/pypi_achatbot.sh dev
# optional-dependencies e.g.
pip install "dist/achatbot-{$version}-py3-none-any.whl[fastapi_bot_server]"
Chat Bot | optional-dependencies | Colab | Device | Pipeline Desc |
---|---|---|---|---|
daily_bot livekit_bot agora_bot |
e.g.: daily_room_audio_stream | livekit_room_audio_stream, sense_voice_asr, groq | together api llm(text), tts_edge |
CPU (free, 2 cores) | e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) -> groq | together (llm) -> edge (tts) -> daily | livekit room out stream |
|
generate_audio2audio | remote_queue_chat_bot_be_worker | T4(free) | e.g.: pyaudio in stream -> silero (vad) -> sense_voice (asr) -> qwen (llm) -> cosy_voice (tts) -> pyaudio out stream |
|
daily_describe_vision_tools_bot livekit_describe_vision_tools_bot agora_describe_vision_tools_bot |
e.g.: daily_room_audio_stream |livekit_room_audio_stream deepgram_asr, goole_gemini, tts_edge |
CPU(free, 2 cores) | e.g.: daily |livekit room in stream -> silero (vad) -> deepgram (asr) -> google gemini -> edge (tts) -> daily |livekit room out stream |
|
daily_describe_vision_bot livekit_describe_vision_bot agora_describe_vision_bot |
e.g.: daily_room_audio_stream | livekit_room_audio_stream sense_voice_asr, llm_transformers_manual_vision_qwen, tts_edge |
achatbot_vision_qwen_vl.ipynb: achatbot_vision_janus.ipynb: achatbot_vision_minicpmo.ipynb: |
- Qwen2-VL-2B-Instruct T4(free) - Qwen2-VL-7B-Instruct L4 - Llama-3.2-11B-Vision-Instruct L4 - allenai/Molmo-7B-D-0924 A100 |
e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) -> qwen-vl (llm) -> edge (tts) -> daily | livekit room out stream |
daily_chat_vision_bot livekit_chat_vision_bot agora_chat_vision_bot |
e.g.: daily_room_audio_stream |livekit_room_audio_stream sense_voice_asr, llm_transformers_manual_vision_qwen, tts_edge |
- Qwen2-VL-2B-Instruct T4(free) - Qwen2-VL-7B-Instruct L4 - Ll ama-3.2-11B-Vision-Instruct L4 - allenai/Molmo-7B-D-0924 A100 |
e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) -> llm answer guide qwen-vl (llm) -> edge (tts) -> daily | livekit room out stream |
|
daily_chat_tools_vision_bot livekit_chat_tools_vision_bot agora_chat_tools_vision_bot |
e.g.: daily_room_audio_stream | livekit_room_audio_stream sense_voice_asr, groq api llm(text), tools: - llm_transformers_manual_vision_qwen, tts_edge |
- Qwen2-VL-2B-Instruct<br /> T4(free) - Qwen2-VL-7B-Instruct L4 - Llama-3.2-11B-Vision-Instruct L4 - allenai/Molmo-7B-D-0924 A100 |
e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) ->llm with tools qwen-vl -> edge (tts) -> daily | livekit room out stream |
|
daily_annotate_vision_bot livekit_annotate_vision_bot agora_annotate_vision_bot |
e.g.: daily_room_audio_stream | livekit_room_audio_stream vision_yolo_detector tts_edge |
T4(free) | e.g.: daily | livekit room in stream vision_yolo_detector -> edge (tts) -> daily | livekit room out stream |
|
daily_detect_vision_bot livekit_detect_vision_bot agora_detect_vision_bot |
e.g.: daily_room_audio_stream | livekit_room_audio_stream vision_yolo_detector tts_edge |
T4(free) | e.g.: daily | livekit room in stream vision_yolo_detector -> edge (tts) -> daily | livekit room out stream |
|
daily_ocr_vision_bot livekit_ocr_vision_bot agora_ocr_vision_bot |
e.g.: daily_room_audio_stream | livekit_room_audio_stream sense_voice_asr, vision_transformers_got_ocr tts_edge |
T4(free) | e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) vision_transformers_got_ocr -> edge (tts) -> daily | livekit room out stream |
|
daily_month_narration_bot | e.g.: daily_room_audio_stream groq |together api llm(text), hf_sd, together api (image) tts_edge |
when use sd model with diffusers T4(free) cpu+cuda (slow) L4 cpu+cuda A100 all cuda |
e.g.: daily room in stream -> together (llm) -> hf sd gen image model -> edge (tts) -> daily room out stream |
|
daily_storytelling_bot | e.g.: daily_room_audio_stream groq |together api llm(text), hf_sd, together api (image) tts_edge |
cpu (2 cores) when use sd model with diffusers T4(free) cpu+cuda (slow) L4 cpu+cuda A100 all cuda |
e.g.: daily room in stream -> together (llm) -> hf sd gen image model -> edge (tts) -> daily room out stream |
|
websocket_server_bot fastapi_websocket_server_bot |
e.g.: websocket_server sense_voice_asr, groq |together api llm(text), tts_edge |
cpu(2 cores) | e.g.: websocket protocol in stream -> silero (vad) -> sense_voice (asr) -> together (llm) -> edge (tts) -> websocket protocol out stream |
|
daily_natural_conversation_bot | e.g.: daily_room_audio_stream sense_voice_asr, groq |together api llm(NLP task), gemini-1.5-flash (chat) tts_edge |
cpu(2 cores) | e.g.: daily room in stream -> together (llm NLP task) -> gemini-1.5-flash model (chat) -> edge (tts) -> daily room out stream |
|
fastapi_websocket_moshi_bot | e.g.: websocket_server moshi opus stream voice llm |
L4/A100 | websocket protocol in stream -> silero (vad) -> moshi opus stream voice llm -> websocket protocol out stream |
|
daily_asr_glm_voice_bot daily_glm_voice_bot |
e.g.: daily_room_audio_stream glm voice llm |
T4/L4/A100 | e.g.: daily room in stream ->glm4-voice -> daily room out stream |
|
daily_freeze_omni_voice_bot | e.g.: daily_room_audio_stream freezeOmni voice llm |
L4/A100 | e.g.: daily room in stream ->freezeOmni-voice -> daily room out stream |
|
daily_asr_minicpmo_voice_bot daily_minicpmo_voice_bot daily_minicpmo_vision_voice_bot |
e.g.: daily_room_audio_stream minicpmo voice llm |
T4: MiniCPM-o-2_6-int4 L4/A100: MiniCPM-o-2_6 |
e.g.: daily room in stream ->minicpmo -> daily room out stream |
|
π Run local chat bots
[!NOTE]
run src code, replace achatbot to src, don't need set
ACHATBOT_PKG=1
e.g.:TQDM_DISABLE=True \ python -m src.cmd.local-terminal-chat.generate_audio2audio > log/std_out.log
PyAudio need install python3-pyaudio e.g. ubuntu
apt-get install python3-pyaudio
, macosbrew install portaudio
see: https://pypi.org/project/PyAudio/llm llama-cpp-python init use cpu Pre-built Wheel to install, if want to use other lib(cuda), see: https://github.com/abetlen/llama-cpp-python#installation-configuration
install
pydub
need installffmpeg
see: https://www.ffmpeg.org/download.html
-
run
pip install "achatbot[local_terminal_chat_bot]"
to install dependencies to run local terminal chat bot; -
create achatbot data dir in
$HOME
dirmkdir -p ~/.achatbot/{log,config,models,records,videos}
; -
cp .env.example .env
, and check.env
, add key/value env params; -
select a model ckpt to download:
- vad model ckpt (default vad ckpt model use silero vad)
# vad pyannote segmentation ckpt huggingface-cli download pyannote/segmentation-3.0 --local-dir ~/.achatbot/models/pyannote/segmentation-3.0 --local-dir-use-symlinks False
- asr model ckpt (default whipser ckpt model use base size)
# asr openai whisper ckpt wget https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt -O ~/.achatbot/models/base.pt # asr hf openai whisper ckpt for transformers pipeline to load huggingface-cli download openai/whisper-base --local-dir ~/.achatbot/models/openai/whisper-base --local-dir-use-symlinks False # asr hf faster whisper (CTranslate2) huggingface-cli download Systran/faster-whisper-base --local-dir ~/.achatbot/models/Systran/faster-whisper-base --local-dir-use-symlinks False # asr SenseVoice ckpt huggingface-cli download FunAudioLLM/SenseVoiceSmall --local-dir ~/.achatbot/models/FunAudioLLM/SenseVoiceSmall --local-dir-use-symlinks False
- llm model ckpt (default llamacpp ckpt(ggml) model use qwen-2 instruct 1.5B size)
# llm llamacpp Qwen2-Instruct huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1_5b-instruct-q8_0.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False # llm llamacpp Qwen1.5-chat huggingface-cli download Qwen/Qwen1.5-7B-Chat-GGUF qwen1_5-7b-chat-q8_0.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False # llm llamacpp phi-3-mini-4k-instruct huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False
- tts model ckpt (default whipser ckpt model use base size)
# tts chatTTS huggingface-cli download 2Noise/ChatTTS --local-dir ~/.achatbot/models/2Noise/ChatTTS --local-dir-use-symlinks False # tts coquiTTS huggingface-cli download coqui/XTTS-v2 --local-dir ~/.achatbot/models/coqui/XTTS-v2 --local-dir-use-symlinks False # tts cosy voice git lfs install git clone https://www.modelscope.cn/iic/CosyVoice-300M.git ~/.achatbot/models/CosyVoice-300M git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git ~/.achatbot/models/CosyVoice-300M-SFT git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git ~/.achatbot/models/CosyVoice-300M-Instruct #git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git ~/.achatbot/models/CosyVoice-ttsfrd
-
run local terminal chat bot with env; e.g.
- use dufault env params to run local chat bot
ACHATBOT_PKG=1 TQDM_DISABLE=True \ python -m achatbot.cmd.local-terminal-chat.generate_audio2audio > ~/.achatbot/log/std_out.log
π Run remote http fastapi daily chat bots
-
run
pip install "achatbot[fastapi_daily_bot_server]"
to install dependencies to run http fastapi daily chat bot; -
run below cmd to start http server, see api docs: http://0.0.0.0:4321/docs
ACHATBOT_PKG=1 python -m achatbot.cmd.http.server.fastapi_daily_bot_serve
-
run chat bot processor, e.g.
- run a daily langchain rag bot api, with ui/educator-client
[!NOTE] need process youtube audio save to local file with
pytube
, runpip install "achatbot[pytube,deep_translator]"
to install dependencies and transcribe/translate to text, then chunks to vector store, and run langchain rag bot api; run data process:ACHATBOT_PKG=1 python -m achatbot.cmd.bots.rag.data_process.youtube_audio_transcribe_to_tidb
or download processed data from hf dataset weege007/youtube_videos, then chunks to vector store .
curl -XPOST "http://0.0.0.0:4321/bot_join/chat-bot/DailyLangchainRAGBot" \ -H "Content-Type: application/json" \ -d $'{"config":{"llm":{"model":"llama-3.1-70b-versatile","messages":[{"role":"system","content":""}],"language":"zh"},"tts":{"tag":"cartesia_tts_processor","args":{"voice_id":"eda5bbff-1ff1-4886-8ef1-4e69a77640a0","language":"zh"}},"asr":{"tag":"deepgram_asr_processor","args":{"language":"zh","model":"nova-2"}}}}' | jq .
- run a simple daily chat bot api, with ui/web-client-ui (default language: zh)
curl -XPOST "http://0.0.0.0:4321/bot_join/DailyBot" \ -H "Content-Type: application/json" \ -d '{}' | jq .
π Run remote rpc chat bot worker
- run
pip install "achatbot[remote_rpc_chat_bot_be_worker]"
to install dependencies to run rpc chat bot BE worker; e.g. :- use dufault env params to run rpc chat bot BE worker
ACHATBOT_PKG=1 RUN_OP=be TQDM_DISABLE=True \
TTS_TAG=tts_edge \
python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
- run
pip install "achatbot[remote_rpc_chat_bot_fe]"
to install dependencies to run rpc chat bot FE;
ACHATBOT_PKG=1 RUN_OP=fe \
TTS_TAG=tts_edge \
python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
π Run remote queue chat bot worker
-
run
pip install "achatbot[remote_queue_chat_bot_be_worker]"
to install dependencies to run queue chat bot worker; e.g.:- use default env params to run
ACHATBOT_PKG=1 REDIS_PASSWORD=$redis_pwd RUN_OP=be TQDM_DISABLE=True \ python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
- sense_voice(asr) -> qwen (llm) -> cosy_voice (tts)
u can login redislabs create 30M free databases; set
REDIS_HOST
,REDIS_PORT
andREDIS_PASSWORD
to run, e.g.:
ACHATBOT_PKG=1 RUN_OP=be \ TQDM_DISABLE=True \ REDIS_PASSWORD=$redis_pwd \ REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \ REDIS_PORT=14241 \ ASR_TAG=sense_voice_asr \ ASR_LANG=zn \ ASR_MODEL_NAME_OR_PATH=~/.achatbot/models/FunAudioLLM/SenseVoiceSmall \ N_GPU_LAYERS=33 FLASH_ATTN=1 \ LLM_MODEL_NAME=qwen \ LLM_MODEL_PATH=~/.achatbot/models/qwen1_5-7b-chat-q8_0.gguf \ TTS_TAG=tts_cosy_voice \ python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
-
run
pip install "achatbot[remote_queue_chat_bot_fe]"
to install the required packages to run quueue chat bot frontend; e.g.:- use default env params to run (default vad_recorder)
ACHATBOT_PKG=1 RUN_OP=fe \ REDIS_PASSWORD=$redis_pwd \ REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \ REDIS_PORT=14241 \ python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
- with wake word
ACHATBOT_PKG=1 RUN_OP=fe \ REDIS_PASSWORD=$redis_pwd \ REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \ REDIS_PORT=14241 \ RECORDER_TAG=wakeword_rms_recorder \ python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
- default pyaudio player stream with tts tag out sample info(rate,channels..), e.g.: (be use tts_cosy_voice out stream info)
ACHATBOT_PKG=1 RUN_OP=fe \ REDIS_PASSWORD=$redis_pwd \ REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \ REDIS_PORT=14241 \ RUN_OP=fe \ TTS_TAG=tts_cosy_voice \ python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
remote_queue_chat_bot_be_worker in colab examples :
- sense_voice(asr) -> qwen (llm) -> cosy_voice (tts)
π Run remote grpc tts speaker bot
- run
pip install "achatbot[remote_grpc_tts_server]"
to install dependencies to run grpc tts speaker bot server;
ACHATBOT_PKG=1 python -m achatbot.cmd.grpc.speaker.server.serve
- run
pip install "achatbot[remote_grpc_tts_client]"
to install dependencies to run grpc tts speaker bot client;
ACHATBOT_PKG=1 TTS_TAG=tts_edge IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_g IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_coqui IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_chat IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_fishspeech IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_f5 IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_openvoicev2 IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_kokoro IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_onnx_kokoro IS_RELOAD=1 KOKORO_ESPEAK_NG_LIB_PATH=/usr/local/lib/libespeak-ng.1.dylib KOKORO_LANGUAGE=cmn python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice2 \
COSY_VOICE_MODELS_DIR=./models/FunAudioLLM/CosyVoice2-0.5B \
COSY_VOICE_REFERENCE_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
πΉ Multimodal Interaction
- stream-ocr (realtime-object-detection)
- Embodied Intelligence: Robots that touch the world, perceive and move
achatbot is released under the BSD 3 license. (Additional code in this distribution is covered by the MIT and Apache Open Source licenses.) However you may have other legal obligations that govern your use of content, such as the terms of service for third-party models.