Skip to content

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

License

Notifications You must be signed in to change notification settings

ai-bot-pro/achatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

achatbot

PyPI

achatbot factory, create chat bots with llm(tools), asr, tts, vad, ocr, detect object etc..

🌲 Project Structure

Project Structure

project-structureΒ 

🌿 Features

Features

  • demo

    • podcast AI Podcast:https://weedge.us.kg/ :)

      # need GOOGLE_API_KEY in environment variables
      # default use language English
      
      # websit
      python -m demo.content_parser_tts instruct-content-tts \
          "https://en.wikipedia.org/wiki/Large_language_model"
      
      python -m demo.content_parser_tts instruct-content-tts \
          --role-tts-voices zh-CN-YunjianNeural \
          --role-tts-voices zh-CN-XiaoxiaoNeural \
          --language zh \
          "https://en.wikipedia.org/wiki/Large_language_model"
      
      # pdf
      # https://web.stanford.edu/~jurafsky/slp3/ed3bookaug20_2024.pdf 600 page is ok~ :)
      python -m demo.content_parser_tts instruct-content-tts \
          "/Users/wuyong/Desktop/Speech and Language Processing.pdf"
      
      python -m demo.content_parser_tts instruct-content-tts \
          --role-tts-voices zh-CN-YunjianNeural \
          --role-tts-voices zh-CN-XiaoxiaoNeural \
          --language zh \
          "/Users/wuyong/Desktop/Speech and Language Processing.pdf"
  • cmd chat bots:

  • support transport connector:

    • pipe(UNIX socket),
    • grpc,
    • queue (redis),
    • websocket
    • TCP/IP socket
  • chat bot processors:

    • aggreators(llm use, assistant message),
    • ai_frameworks
    • realtime voice inference(RTVI),
    • transport:
    • ai processor: llm, tts, asr etc..
      • llm_processor:
  • core module:

    • local llm:
      • llama-cpp (support text,vision with function-call model)
      • transformers(manual, pipeline) (support text; vision,vision+image; speech,voice; vision+voice)
        • llm_transformers_manual_vision_llama
        • llm_transformers_manual_vision_molmo
        • llm_transformers_manual_vision_qwen
        • llm_transformers_manual_vision_deepseek
        • llm_transformers_manual_vision_janus_flow
        • llm_transformers_manual_image_janus_flow
        • llm_transformers_manual_vision_janus
        • llm_transformers_manual_image_janus
        • llm_transformers_manual_speech_llasa
        • llm_transformers_manual_speech_step
        • llm_transformers_manual_voice_glm
        • llm_transformers_manual_vision_voice_minicpmo, llm_transformers_manual_voice_minicpmo,llm_transformers_manual_audio_minicpmo,llm_transformers_manual_text_speech_minicpmo,llm_transformers_manual_instruct_speech_minicpmo,llm_transformers_manual_vision_minicpmo
    • remote api llm: personal-ai(like openai api, other ai provider)
  • AI modules:

    • functions:
      • search: search,search1,serper
      • weather: openweathermap
    • speech:
      • asr:
        • whisper_asr, whisper_timestamped_asr, whisper_faster_asr, whisper_transformers_asr, whisper_mlx_asr
        • whisper_groq_asr
        • sense_voice_asr
        • minicpmo_asr (whisper)
      • audio_stream: daily_room_audio_stream(in/out), pyaudio_stream(in/out)
      • detector: porcupine_wakeword,pyannote_vad,webrtc_vad,silero_vad,webrtc_silero_vad,fsmn_vad
      • player: stream_player
      • recorder: rms_recorder, wakeword_rms_recorder, vad_recorder, wakeword_vad_recorder
      • tts:
        • tts_edge
        • tts_g
        • tts_coqui
        • tts_chat
        • tts_cosy_voice,tts_cosy_voice2
        • tts_f5
        • tts_openvoicev2
        • tts_kokoro,tts_onnx_kokoro
        • tts_fishspeech
        • tts_llasa
        • tts_minicpmo
        • tts_zonos
        • tts_step
      • vad_analyzer:
        • daily_webrtc_vad_analyzer
        • silero_vad_analyzer
    • vision
      • OCR(Optical Character Recognition):
      • Detector:
        • YOLO (You Only Look Once)
        • RT-DETR v2 (RealTime End-to-End Object Detection with Transformers)
  • gen modules config(*.yaml, local/test/prod) from env with file: .env u also use HfArgumentParser this module's args to local cmd parse args

  • deploy to cloud ☁️ serverless:

🌻 Service Deployment Architecture

Service Deployment Architecture

UI (easy to deploy with github like pages)

Server Deploy (CD)

Install

Note

python --version >=3.10 with asyncio-task if install achatbot[tts_openvoicev2] need install melo-tts pip install git+https://github.com/myshell-ai/MeloTTS.git

Tip

use uv + pip to run, install the required dependencies fastly, e.g.: uv pip install achatbot uv pip install "achatbot[fastapi_bot_server]"

pypi

python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
pip install achatbot
# optional-dependencies e.g.
pip install "achatbot[fastapi_bot_server]"

local

git clone --recursive https://github.com/ai-bot-pro/chat-bot.git
cd chat-bot
python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
bash scripts/pypi_achatbot.sh dev
# optional-dependencies e.g.
pip install "dist/achatbot-{$version}-py3-none-any.whl[fastapi_bot_server]"

Run chat bots

πŸ“ Run chat bots with colab notebook

Chat Bot optional-dependencies Colab Device Pipeline Desc
daily_bot
livekit_bot
agora_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream,
sense_voice_asr,
groq | together api llm(text),
tts_edge
Open In Colab CPU (free, 2 cores) e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
-> groq | together (llm)
-> edge (tts)
-> daily | livekit room out stream
generate_audio2audio remote_queue_chat_bot_be_worker Open In Colab T4(free) e.g.:
pyaudio in stream
-> silero (vad)
-> sense_voice (asr)
-> qwen (llm)
-> cosy_voice (tts)
-> pyaudio out stream
daily_describe_vision_tools_bot
livekit_describe_vision_tools_bot
agora_describe_vision_tools_bot
e.g.:
daily_room_audio_stream |livekit_room_audio_stream
deepgram_asr,
goole_gemini,
tts_edge
Open In Colab CPU(free, 2 cores) e.g.:
daily |livekit room in stream
-> silero (vad)
-> deepgram (asr)
-> google gemini
-> edge (tts)
-> daily |livekit room out stream
daily_describe_vision_bot
livekit_describe_vision_bot
agora_describe_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
sense_voice_asr,
llm_transformers_manual_vision_qwen,
tts_edge
achatbot_vision_qwen_vl.ipynb:
Open In Colab
achatbot_vision_janus.ipynb:
Open In Colab
achatbot_vision_minicpmo.ipynb:
Open In Colab
- Qwen2-VL-2B-Instruct
T4(free)
- Qwen2-VL-7B-Instruct
L4
- Llama-3.2-11B-Vision-Instruct
L4
- allenai/Molmo-7B-D-0924
A100
e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
-> qwen-vl (llm)
-> edge (tts)
-> daily | livekit room out stream
daily_chat_vision_bot
livekit_chat_vision_bot
agora_chat_vision_bot
e.g.:
daily_room_audio_stream |livekit_room_audio_stream
sense_voice_asr,
llm_transformers_manual_vision_qwen,
tts_edge
Open In Colab - Qwen2-VL-2B-Instruct
T4(free)
- Qwen2-VL-7B-Instruct
L4
- Ll
ama-3.2-11B-Vision-Instruct
L4
- allenai/Molmo-7B-D-0924
A100
e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
-> llm answer guide qwen-vl (llm)
-> edge (tts)
-> daily | livekit room out stream
daily_chat_tools_vision_bot
livekit_chat_tools_vision_bot
agora_chat_tools_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
sense_voice_asr,
groq api llm(text),
tools:
- llm_transformers_manual_vision_qwen,
tts_edge
Open In Colab - Qwen2-VL-2B-Instruct<br
/> T4(free)
- Qwen2-VL-7B-Instruct
L4
- Llama-3.2-11B-Vision-Instruct
L4
- allenai/Molmo-7B-D-0924
A100
e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
->llm with tools qwen-vl
-> edge (tts)
-> daily | livekit room out stream
daily_annotate_vision_bot
livekit_annotate_vision_bot
agora_annotate_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
vision_yolo_detector
tts_edge
Open In Colab T4(free) e.g.:
daily | livekit room in stream
vision_yolo_detector
-> edge (tts)
-> daily | livekit room out stream
daily_detect_vision_bot
livekit_detect_vision_bot
agora_detect_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
vision_yolo_detector
tts_edge
Open In Colab T4(free) e.g.:
daily | livekit room in stream
vision_yolo_detector
-> edge (tts)
-> daily | livekit room out stream
daily_ocr_vision_bot
livekit_ocr_vision_bot
agora_ocr_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
sense_voice_asr,
vision_transformers_got_ocr
tts_edge
Open In Colab T4(free) e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
vision_transformers_got_ocr
-> edge (tts)
-> daily | livekit room out stream
daily_month_narration_bot e.g.:
daily_room_audio_stream
groq |together api llm(text),
hf_sd, together api (image)
tts_edge
Open In Colab when use sd model with diffusers
T4(free) cpu+cuda (slow)
L4 cpu+cuda
A100 all cuda
e.g.:
daily room in stream
-> together (llm)
-> hf sd gen image model
-> edge (tts)
-> daily room out stream
daily_storytelling_bot e.g.:
daily_room_audio_stream
groq |together api llm(text),
hf_sd, together api (image)
tts_edge
Open In Colab cpu (2 cores)
when use sd model with diffusers
T4(free) cpu+cuda (slow)
L4 cpu+cuda
A100 all cuda
e.g.:
daily room in stream
-> together (llm)
-> hf sd gen image model
-> edge (tts)
-> daily room out stream
websocket_server_bot
fastapi_websocket_server_bot
e.g.:
websocket_server
sense_voice_asr,
groq |together api llm(text),
tts_edge
Open In Colab cpu(2 cores) e.g.:
websocket protocol in stream
-> silero (vad)
-> sense_voice (asr)
-> together (llm)
-> edge (tts)
-> websocket protocol out stream
daily_natural_conversation_bot e.g.:
daily_room_audio_stream
sense_voice_asr,
groq |together api llm(NLP task),
gemini-1.5-flash (chat)
tts_edge
Open In Colab cpu(2 cores) e.g.:
daily room in stream
-> together (llm NLP task)
-> gemini-1.5-flash model (chat)
-> edge (tts)
-> daily room out stream
fastapi_websocket_moshi_bot e.g.:
websocket_server
moshi opus stream voice llm
Open In Colab L4/A100 websocket protocol in stream
-> silero (vad)
-> moshi opus stream voice llm
-> websocket protocol out stream
daily_asr_glm_voice_bot
daily_glm_voice_bot
e.g.:
daily_room_audio_stream
glm voice llm
Open In Colab T4/L4/A100 e.g.:
daily room in stream
->glm4-voice
-> daily room out stream
daily_freeze_omni_voice_bot e.g.:
daily_room_audio_stream
freezeOmni voice llm
Open In Colab L4/A100 e.g.:
daily room in stream
->freezeOmni-voice
-> daily room out stream
daily_asr_minicpmo_voice_bot
daily_minicpmo_voice_bot
daily_minicpmo_vision_voice_bot
e.g.:
daily_room_audio_stream
minicpmo voice llm
Open In Colab T4: MiniCPM-o-2_6-int4
L4/A100: MiniCPM-o-2_6
e.g.:
daily room in stream
->minicpmo
-> daily room out stream
πŸŒ‘ Run local chat bots

Run local chat bots

[!NOTE]

  1. run pip install "achatbot[local_terminal_chat_bot]" to install dependencies to run local terminal chat bot;

  2. create achatbot data dir in $HOME dir mkdir -p ~/.achatbot/{log,config,models,records,videos};

  3. cp .env.example .env, and check .env, add key/value env params;

  4. select a model ckpt to download:

    • vad model ckpt (default vad ckpt model use silero vad)
    # vad pyannote segmentation ckpt
    huggingface-cli download pyannote/segmentation-3.0  --local-dir ~/.achatbot/models/pyannote/segmentation-3.0 --local-dir-use-symlinks False
    
    • asr model ckpt (default whipser ckpt model use base size)
    # asr openai whisper ckpt
    wget https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt -O ~/.achatbot/models/base.pt
    
    # asr hf openai whisper ckpt for transformers pipeline to load
    huggingface-cli download openai/whisper-base  --local-dir ~/.achatbot/models/openai/whisper-base --local-dir-use-symlinks False
    
    # asr hf faster whisper (CTranslate2)
    huggingface-cli download Systran/faster-whisper-base  --local-dir ~/.achatbot/models/Systran/faster-whisper-base --local-dir-use-symlinks False
    
    # asr SenseVoice ckpt
    huggingface-cli download FunAudioLLM/SenseVoiceSmall  --local-dir ~/.achatbot/models/FunAudioLLM/SenseVoiceSmall --local-dir-use-symlinks False
    
    • llm model ckpt (default llamacpp ckpt(ggml) model use qwen-2 instruct 1.5B size)
    # llm llamacpp Qwen2-Instruct
    huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1_5b-instruct-q8_0.gguf  --local-dir ~/.achatbot/models --local-dir-use-symlinks False
    
    # llm llamacpp Qwen1.5-chat
    huggingface-cli download Qwen/Qwen1.5-7B-Chat-GGUF qwen1_5-7b-chat-q8_0.gguf  --local-dir ~/.achatbot/models --local-dir-use-symlinks False
    
    # llm llamacpp phi-3-mini-4k-instruct
    huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False
    
    
    • tts model ckpt (default whipser ckpt model use base size)
    # tts chatTTS
    huggingface-cli download 2Noise/ChatTTS  --local-dir ~/.achatbot/models/2Noise/ChatTTS --local-dir-use-symlinks False
    
    # tts coquiTTS
    huggingface-cli download coqui/XTTS-v2  --local-dir ~/.achatbot/models/coqui/XTTS-v2 --local-dir-use-symlinks False
    
    # tts cosy voice
    git lfs install
    git clone https://www.modelscope.cn/iic/CosyVoice-300M.git ~/.achatbot/models/CosyVoice-300M
    git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git ~/.achatbot/models/CosyVoice-300M-SFT
    git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git ~/.achatbot/models/CosyVoice-300M-Instruct
    #git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git ~/.achatbot/models/CosyVoice-ttsfrd
    
    
  5. run local terminal chat bot with env; e.g.

    • use dufault env params to run local chat bot
    ACHATBOT_PKG=1 TQDM_DISABLE=True \
        python -m achatbot.cmd.local-terminal-chat.generate_audio2audio > ~/.achatbot/log/std_out.log
    
πŸŒ’ Run remote http fastapi daily chat bots

Run remote http fastapi daily chat bots

  1. run pip install "achatbot[fastapi_daily_bot_server]" to install dependencies to run http fastapi daily chat bot;

  2. run below cmd to start http server, see api docs: http://0.0.0.0:4321/docs

    ACHATBOT_PKG=1 python -m achatbot.cmd.http.server.fastapi_daily_bot_serve
    
  3. run chat bot processor, e.g.

    • run a daily langchain rag bot api, with ui/educator-client

    [!NOTE] need process youtube audio save to local file with pytube, run pip install "achatbot[pytube,deep_translator]" to install dependencies and transcribe/translate to text, then chunks to vector store, and run langchain rag bot api; run data process:

    ACHATBOT_PKG=1 python -m achatbot.cmd.bots.rag.data_process.youtube_audio_transcribe_to_tidb
    

    or download processed data from hf dataset weege007/youtube_videos, then chunks to vector store .

    curl -XPOST "http://0.0.0.0:4321/bot_join/chat-bot/DailyLangchainRAGBot" \
     -H "Content-Type: application/json" \
     -d $'{"config":{"llm":{"model":"llama-3.1-70b-versatile","messages":[{"role":"system","content":""}],"language":"zh"},"tts":{"tag":"cartesia_tts_processor","args":{"voice_id":"eda5bbff-1ff1-4886-8ef1-4e69a77640a0","language":"zh"}},"asr":{"tag":"deepgram_asr_processor","args":{"language":"zh","model":"nova-2"}}}}' | jq .
    
    • run a simple daily chat bot api, with ui/web-client-ui (default language: zh)
    curl -XPOST "http://0.0.0.0:4321/bot_join/DailyBot" \
     -H "Content-Type: application/json" \
     -d '{}' | jq .
    
πŸŒ“ Run remote rpc chat bot worker

Run remote rpc chat bot worker

  1. run pip install "achatbot[remote_rpc_chat_bot_be_worker]" to install dependencies to run rpc chat bot BE worker; e.g. :
    • use dufault env params to run rpc chat bot BE worker
ACHATBOT_PKG=1 RUN_OP=be TQDM_DISABLE=True \
    TTS_TAG=tts_edge \
    python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
  1. run pip install "achatbot[remote_rpc_chat_bot_fe]" to install dependencies to run rpc chat bot FE;
ACHATBOT_PKG=1 RUN_OP=fe \
    TTS_TAG=tts_edge \
    python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
πŸŒ” Run remote queue chat bot worker

Run remote queue chat bot worker

  1. run pip install "achatbot[remote_queue_chat_bot_be_worker]" to install dependencies to run queue chat bot worker; e.g.:

    • use default env params to run
    ACHATBOT_PKG=1 REDIS_PASSWORD=$redis_pwd RUN_OP=be TQDM_DISABLE=True \
        python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
    
    • sense_voice(asr) -> qwen (llm) -> cosy_voice (tts) u can login redislabs create 30M free databases; set REDIS_HOST,REDIS_PORT and REDIS_PASSWORD to run, e.g.:
     ACHATBOT_PKG=1 RUN_OP=be \
       TQDM_DISABLE=True \
       REDIS_PASSWORD=$redis_pwd \
       REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
       REDIS_PORT=14241 \
       ASR_TAG=sense_voice_asr \
       ASR_LANG=zn \
       ASR_MODEL_NAME_OR_PATH=~/.achatbot/models/FunAudioLLM/SenseVoiceSmall \
       N_GPU_LAYERS=33 FLASH_ATTN=1 \
       LLM_MODEL_NAME=qwen \
       LLM_MODEL_PATH=~/.achatbot/models/qwen1_5-7b-chat-q8_0.gguf \
       TTS_TAG=tts_cosy_voice \
       python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
    
  2. run pip install "achatbot[remote_queue_chat_bot_fe]" to install the required packages to run quueue chat bot frontend; e.g.:

    • use default env params to run (default vad_recorder)
    ACHATBOT_PKG=1 RUN_OP=fe \
        REDIS_PASSWORD=$redis_pwd \
        REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
        REDIS_PORT=14241 \
        python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
    
    • with wake word
    ACHATBOT_PKG=1 RUN_OP=fe \
        REDIS_PASSWORD=$redis_pwd \
        REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
        REDIS_PORT=14241 \
        RECORDER_TAG=wakeword_rms_recorder \
        python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
    
    • default pyaudio player stream with tts tag out sample info(rate,channels..), e.g.: (be use tts_cosy_voice out stream info)
     ACHATBOT_PKG=1 RUN_OP=fe \
         REDIS_PASSWORD=$redis_pwd \
         REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
         REDIS_PORT=14241 \
         RUN_OP=fe \
         TTS_TAG=tts_cosy_voice \
         python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
    

    remote_queue_chat_bot_be_worker in colab examples : Open In Colab

    • sense_voice(asr) -> qwen (llm) -> cosy_voice (tts)
πŸŒ• Run remote grpc tts speaker bot

Run remote grpc tts speaker bot

  1. run pip install "achatbot[remote_grpc_tts_server]" to install dependencies to run grpc tts speaker bot server;
ACHATBOT_PKG=1 python -m achatbot.cmd.grpc.speaker.server.serve
  1. run pip install "achatbot[remote_grpc_tts_client]" to install dependencies to run grpc tts speaker bot client;
ACHATBOT_PKG=1 TTS_TAG=tts_edge IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_g IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_coqui IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_chat IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_fishspeech IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_f5 IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_openvoicev2 IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_kokoro IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_onnx_kokoro IS_RELOAD=1 KOKORO_ESPEAK_NG_LIB_PATH=/usr/local/lib/libespeak-ng.1.dylib KOKORO_LANGUAGE=cmn python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice2 \
    COSY_VOICE_MODELS_DIR=./models/FunAudioLLM/CosyVoice2-0.5B \
    COSY_VOICE_REFERENCE_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
    IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
πŸ“Ή Multimodal Interaction

Multimodal Interaction

audio (voice)

  • stream-stt (realtime-recorder) audio-text

  • audio-llm (multimode-chat) pipe queue

  • stream-tts (realtime-(clone)-speaker) text-audio audio-text text-audio

vision (CV)

  • stream-ocr (realtime-object-detection)

more

  • Embodied Intelligence: Robots that touch the world, perceive and move

License

achatbot is released under the BSD 3 license. (Additional code in this distribution is covered by the MIT and Apache Open Source licenses.) However you may have other legal obligations that govern your use of content, such as the terms of service for third-party models.

About

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages