diff --git a/examples/assets/ttfa_2_vs_4_threads_llm_with_stt.png b/examples/assets/ttfa_2_vs_4_threads_llm_with_stt.png new file mode 100644 index 0000000..1591cf2 Binary files /dev/null and b/examples/assets/ttfa_2_vs_4_threads_llm_with_stt.png differ diff --git a/examples/assets/ttfs_2_vs_4_threads_llm_with_tts.png b/examples/assets/ttfs_2_vs_4_threads_llm_with_tts.png new file mode 100644 index 0000000..ebe370e Binary files /dev/null and b/examples/assets/ttfs_2_vs_4_threads_llm_with_tts.png differ diff --git a/examples/experimentals/voice_engine/environment.log b/examples/experimentals/voice_engine/environment.log new file mode 100644 index 0000000..2dde523 --- /dev/null +++ b/examples/experimentals/voice_engine/environment.log @@ -0,0 +1,477 @@ +2024-11-13 14:01:53,569 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:01:58,586 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:01:58,596 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:01:58,598 - __main__ - INFO - STT process initialized with PID: 8415 +2024-11-13 14:01:58,599 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:01:58,602 - __main__ - ERROR - Failed to initialize LLM process: [Errno 13] Permission denied: '/data/data/com.termux/files/home/llama.cpp/llama-server' +2024-11-13 14:04:08,010 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:04:13,023 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:04:13,033 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:04:13,035 - __main__ - INFO - STT process initialized with PID: 8554 +2024-11-13 14:04:13,036 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:04:13,041 - __main__ - ERROR - Failed to initialize LLM process: [Errno 13] Permission denied: '/data/data/com.termux/files/home/llama.cpp/llama-server' +2024-11-13 14:24:33,136 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:24:38,165 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:24:38,175 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:24:38,178 - __main__ - INFO - STT process initialized with PID: 10850 +2024-11-13 14:24:38,178 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:24:48,265 - __main__ - INFO - LLM process initialized with PID: 10859 +2024-11-13 14:24:48,293 - __main__ - INFO - Process 10850 terminated gracefully. +2024-11-13 14:24:48,469 - __main__ - INFO - Process 10859 terminated gracefully. +2024-11-13 14:31:24,457 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:31:29,481 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:31:29,494 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:31:29,497 - __main__ - INFO - STT process initialized with PID: 16751 +2024-11-13 14:31:29,497 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:31:39,506 - __main__ - INFO - LLM process initialized with PID: 16760 +2024-11-13 14:31:39,535 - __main__ - INFO - Process 16751 terminated gracefully. +2024-11-13 14:31:39,699 - __main__ - INFO - Process 16760 terminated gracefully. +2024-11-13 14:35:13,569 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:35:18,593 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:35:18,604 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:35:18,606 - __main__ - INFO - STT process initialized with PID: 17760 +2024-11-13 14:35:18,606 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:35:28,622 - __main__ - INFO - LLM process initialized with PID: 17772 +2024-11-13 14:35:33,724 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:35:33,741 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 35 +2024-11-13 14:35:33,757 - __main__ - INFO - Process 17760 terminated gracefully. +2024-11-13 14:35:33,972 - __main__ - INFO - Process 17772 terminated gracefully. +2024-11-13 14:40:59,905 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:41:04,929 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:41:04,943 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:41:04,945 - __main__ - INFO - STT process initialized with PID: 22547 +2024-11-13 14:41:04,945 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:41:14,958 - __main__ - INFO - LLM process initialized with PID: 22563 +2024-11-13 14:41:17,573 - __main__ - INFO - Process 22547 terminated gracefully. +2024-11-13 14:41:17,739 - __main__ - INFO - Process 22563 terminated gracefully. +2024-11-13 14:42:04,177 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:42:09,203 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:42:09,211 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:42:09,213 - __main__ - INFO - STT process initialized with PID: 22741 +2024-11-13 14:42:09,213 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:42:19,223 - __main__ - INFO - LLM process initialized with PID: 22750 +2024-11-13 14:42:21,680 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:42:25,983 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:42:25,985 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:42:25,985 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:42:25,988 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:42:25,996 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:42:25,996 - __main__ - DEBUG - LLM response: +2024-11-13 14:42:27,441 - __main__ - INFO - TTFS: 5.778118371963501 +2024-11-13 14:42:35,438 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:44:10,904 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:44:15,222 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:44:15,231 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:44:15,236 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:44:15,242 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:44:15,254 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:44:15,261 - __main__ - DEBUG - LLM response: +2024-11-13 14:44:16,192 - __main__ - INFO - TTFS: 5.319575071334839 +2024-11-13 14:44:28,328 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:51:00,038 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:51:05,062 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:51:05,072 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:51:05,075 - __main__ - INFO - STT process initialized with PID: 24744 +2024-11-13 14:51:05,076 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 4 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:51:15,083 - __main__ - INFO - LLM process initialized with PID: 24753 +2024-11-13 14:51:16,671 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:51:20,765 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:51:20,766 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:51:20,767 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:51:20,772 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:51:20,781 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:51:20,782 - __main__ - DEBUG - LLM response: +2024-11-13 14:51:21,858 - __main__ - INFO - TTFS: 5.194319725036621 +2024-11-13 14:51:43,463 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:52:02,088 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:52:05,715 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:52:05,717 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:52:05,717 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:52:05,720 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:52:05,724 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:52:05,725 - __main__ - DEBUG - LLM response: +2024-11-13 14:52:06,926 - __main__ - INFO - TTFS: 4.842855453491211 +2024-11-13 14:52:29,029 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:55:20,037 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:55:25,059 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:55:25,070 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:55:25,073 - __main__ - INFO - STT process initialized with PID: 27355 +2024-11-13 14:55:25,073 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 4 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:55:35,084 - __main__ - INFO - LLM process initialized with PID: 27364 +2024-11-13 14:55:47,163 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:55:51,171 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:55:51,172 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:55:51,173 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:55:51,176 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:55:51,181 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:55:51,182 - __main__ - DEBUG - LLM response: +2024-11-13 14:55:52,203 - __main__ - INFO - TTFS: 5.0501768589019775 +2024-11-13 14:56:24,591 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:56:58,785 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:57:02,408 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:57:02,409 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:57:02,411 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:57:02,415 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:57:02,419 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:57:02,421 - __main__ - DEBUG - LLM response: +2024-11-13 14:57:03,427 - __main__ - INFO - TTFS: 4.6472227573394775 +2024-11-13 14:57:26,751 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:58:19,877 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 14:58:24,905 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:58:24,912 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 14:58:24,914 - __main__ - INFO - STT process initialized with PID: 28320 +2024-11-13 14:58:24,914 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 14:58:34,917 - __main__ - INFO - LLM process initialized with PID: 28329 +2024-11-13 14:58:44,587 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:58:48,728 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:58:48,733 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:58:48,736 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:58:48,740 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:58:48,745 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:58:48,746 - __main__ - DEBUG - LLM response: +2024-11-13 14:58:51,879 - __main__ - INFO - TTFS: 7.300859451293945 +2024-11-13 14:59:04,608 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 14:59:15,314 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 14:59:18,960 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 14:59:18,961 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 14:59:18,962 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 14:59:18,967 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 14:59:18,971 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 14:59:18,972 - __main__ - DEBUG - LLM response: +2024-11-13 14:59:19,879 - __main__ - INFO - TTFS: 4.58224630355835 +2024-11-13 14:59:33,417 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:00:03,367 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:00:07,011 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:00:07,013 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:00:07,014 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:00:07,018 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:00:07,025 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:00:07,026 - __main__ - DEBUG - LLM response: +2024-11-13 15:00:07,923 - __main__ - INFO - TTFS: 4.569139003753662 +2024-11-13 15:00:20,743 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:14:50,561 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 15:14:55,586 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:14:55,601 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 15:14:55,603 - __main__ - INFO - STT process initialized with PID: 7594 +2024-11-13 15:14:55,603 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 20 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 15:15:05,613 - __main__ - INFO - LLM process initialized with PID: 7610 +2024-11-13 15:15:11,880 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:15:16,129 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:15:16,133 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:15:16,135 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:15:16,142 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:15:16,168 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:15:16,169 - __main__ - DEBUG - LLM response: +2024-11-13 15:15:19,826 - __main__ - INFO - TTFS: 7.970261096954346 +2024-11-13 15:15:33,055 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:15:41,269 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:15:46,513 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:15:46,514 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:15:46,515 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:15:46,529 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:15:46,545 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:15:46,546 - __main__ - DEBUG - LLM response: +2024-11-13 15:15:47,750 - __main__ - INFO - TTFS: 6.495905160903931 +2024-11-13 15:16:07,495 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:19:53,329 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 15:19:58,353 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:19:58,369 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 15:19:58,372 - __main__ - INFO - STT process initialized with PID: 17073 +2024-11-13 15:19:58,372 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 15:20:08,382 - __main__ - INFO - LLM process initialized with PID: 17084 +2024-11-13 15:20:13,987 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:20:18,139 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:20:18,143 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:20:18,144 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:20:18,153 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:20:18,357 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:20:18,358 - __main__ - DEBUG - LLM response: +2024-11-13 15:20:19,192 - __main__ - INFO - TTFS: 5.216456651687622 +2024-11-13 15:20:46,262 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:20:51,005 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:20:54,997 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:20:55,003 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:20:55,005 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:20:55,010 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:20:55,020 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:20:55,021 - __main__ - DEBUG - LLM response: +2024-11-13 15:20:55,906 - __main__ - INFO - TTFS: 4.929144382476807 +2024-11-13 15:21:16,969 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:28:38,763 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 15:28:43,784 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:28:43,797 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 15:28:43,800 - __main__ - INFO - STT process initialized with PID: 19544 +2024-11-13 15:28:43,801 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 15:28:53,809 - __main__ - INFO - LLM process initialized with PID: 19555 +2024-11-13 15:28:53,837 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:28:57,890 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:28:57,891 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:28:57,892 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:28:57,895 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:28:57,899 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:28:57,899 - __main__ - DEBUG - LLM response: +2024-11-13 15:28:58,812 - __main__ - INFO - TTFS: 4.995412588119507 +2024-11-13 15:29:14,216 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:29:14,228 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:29:17,867 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:29:17,869 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:29:17,869 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:29:17,874 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:29:17,879 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:29:17,880 - __main__ - DEBUG - LLM response: +2024-11-13 15:29:18,787 - __main__ - INFO - TTFS: 4.568600654602051 +2024-11-13 15:29:30,814 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:29:30,827 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:29:34,473 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:29:34,475 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:29:34,476 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:29:34,480 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:29:34,486 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:29:34,487 - __main__ - DEBUG - LLM response: +2024-11-13 15:29:35,392 - __main__ - INFO - TTFS: 4.5740437507629395 +2024-11-13 15:29:46,359 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:29:50,053 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:29:54,103 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 46 +2024-11-13 15:29:54,105 - __main__ - DEBUG - STT response: STTResponse(text=' What is the value of 34 plus 53?\n') +2024-11-13 15:29:54,106 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:29:54,108 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:29:54,112 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:29:54,114 - __main__ - DEBUG - LLM response: +2024-11-13 15:29:55,016 - __main__ - INFO - TTFS: 4.9750142097473145 +2024-11-13 15:30:03,720 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:30:51,976 - __main__ - INFO - Process 19544 terminated gracefully. +2024-11-13 15:30:52,344 - __main__ - INFO - Process 19555 terminated gracefully. +2024-11-13 15:48:38,415 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 15:48:43,438 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:48:43,446 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 15:48:43,448 - __main__ - INFO - STT process initialized with PID: 31465 +2024-11-13 15:48:43,449 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 15:48:53,456 - __main__ - INFO - LLM process initialized with PID: 31481 +2024-11-13 15:48:56,556 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:49:00,762 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 42 +2024-11-13 15:49:00,766 - __main__ - DEBUG - STT response: STTResponse(text=' Give me 5 examples of colors\n') +2024-11-13 15:49:00,767 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:49:00,773 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:49:00,783 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:49:00,784 - __main__ - DEBUG - LLM response: +2024-11-13 15:49:01,575 - __main__ - INFO - TTFS: 5.037225723266602 +2024-11-13 15:49:27,828 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 15:57:30,444 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 15:57:35,466 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:57:35,477 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 15:57:35,479 - __main__ - INFO - STT process initialized with PID: 4584 +2024-11-13 15:57:35,480 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 15:57:45,518 - __main__ - INFO - LLM process initialized with PID: 4597 +2024-11-13 15:57:52,592 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 15:57:57,229 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 42 +2024-11-13 15:57:57,235 - __main__ - DEBUG - STT response: STTResponse(text=' Give me 5 examples of colors\n') +2024-11-13 15:57:57,238 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 15:57:57,241 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 15:57:57,250 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 15:57:57,251 - __main__ - DEBUG - LLM response: +2024-11-13 15:57:58,101 - __main__ - INFO - TTFS: 5.519280195236206 +2024-11-13 15:58:20,259 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 16:01:23,932 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 16:56:38,215 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 16:56:43,240 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 16:56:43,249 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 16:56:43,252 - __main__ - INFO - STT process initialized with PID: 994 +2024-11-13 16:56:43,253 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 16:56:53,261 - __main__ - INFO - LLM process initialized with PID: 1033 +2024-11-13 16:57:38,944 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 16:57:42,571 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 42 +2024-11-13 16:57:42,576 - __main__ - DEBUG - STT response: STTResponse(text=' Give me 5 examples of colors\n') +2024-11-13 16:57:42,577 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 16:57:42,578 - __main__ - DEBUG - TTS Thread Started +2024-11-13 16:57:42,583 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 16:57:42,596 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 16:57:42,597 - __main__ - DEBUG - LLM response: +2024-11-13 16:57:42,599 - __main__ - DEBUG - PIPER PID, 2253 +2024-11-13 16:57:44,621 - __main__ - INFO - TTFS: 5.689475059509277 +2024-11-13 16:57:45,908 - __main__ - DEBUG - Sending ----> Here are 5 examples of colors +2024-11-13 16:57:46,146 - __main__ - DEBUG - Sending ----> : + +2024-11-13 16:57:46,592 - __main__ - DEBUG - Sending ----> 1. +2024-11-13 16:57:47,474 - __main__ - DEBUG - Sending ----> Red +2. +2024-11-13 16:57:48,316 - __main__ - DEBUG - Sending ----> Blue +3. +2024-11-13 16:57:49,144 - __main__ - DEBUG - Sending ----> Yellow +4. +2024-11-13 16:57:49,958 - __main__ - DEBUG - Sending ----> Green +5. +2024-11-13 16:57:51,727 - __main__ - DEBUG - Sending ----> Purple + +I'm happy to give you more +2024-11-13 16:57:52,121 - __main__ - DEBUG - Sending ----> examples if +2024-11-13 16:57:52,923 - __main__ - DEBUG - Sending ----> you'd like! +2024-11-13 16:57:53,110 - __main__ - DEBUG - Received Stop Event At TTS +2024-11-13 16:57:53,112 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 17:01:28,067 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 17:01:31,681 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 42 +2024-11-13 17:01:31,682 - __main__ - DEBUG - STT response: STTResponse(text=' Give me 5 examples of colors\n') +2024-11-13 17:01:31,683 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 17:01:31,684 - __main__ - DEBUG - TTS Thread Started +2024-11-13 17:01:31,686 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 17:01:31,696 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 17:01:31,697 - __main__ - DEBUG - LLM response: +2024-11-13 17:01:31,697 - __main__ - DEBUG - PIPER PID, 3006 +2024-11-13 17:01:32,538 - __main__ - INFO - TTFS: 4.486324310302734 +2024-11-13 17:01:33,762 - __main__ - DEBUG - Sending ----> Here are 5 examples of colors +2024-11-13 17:01:36,713 - __main__ - DEBUG - Sending ----> +Blue +Red +Yellow +Green +Purple + +Let me know if +2024-11-13 17:01:38,077 - __main__ - DEBUG - Sending ----> you'd like me to provide more +2024-11-13 17:01:38,479 - __main__ - DEBUG - Sending ----> examples. +2024-11-13 17:01:39,057 - __main__ - DEBUG - Sending ----> + +Note: +2024-11-13 17:01:39,651 - __main__ - DEBUG - Sending ----> I can also +2024-11-13 17:01:40,259 - __main__ - DEBUG - Sending ----> give you information +2024-11-13 17:01:40,831 - __main__ - DEBUG - Sending ----> about the colors +2024-11-13 17:01:41,431 - __main__ - DEBUG - Sending ----> you choose if +2024-11-13 17:01:42,220 - __main__ - DEBUG - Sending ----> you'd like. +2024-11-13 17:01:42,404 - __main__ - DEBUG - Sending ----> For +2024-11-13 17:01:42,823 - __main__ - DEBUG - Sending ----> example, +2024-11-13 17:01:43,008 - __main__ - DEBUG - Sending ----> if +2024-11-13 17:01:43,811 - __main__ - DEBUG - Sending ----> you choose blue, +2024-11-13 17:01:44,610 - __main__ - DEBUG - Sending ----> I could provide information +2024-11-13 17:01:44,997 - __main__ - DEBUG - Sending ----> about different +2024-11-13 17:01:45,815 - __main__ - DEBUG - Sending ----> shades of blue, +2024-11-13 17:01:46,595 - __main__ - DEBUG - Sending ----> blue pigments, +2024-11-13 17:01:47,402 - __main__ - DEBUG - Sending ----> blue dyes, +2024-11-13 17:01:47,814 - __main__ - DEBUG - Sending ----> blue colors +2024-11-13 17:01:48,412 - __main__ - DEBUG - Sending ----> in nature, +2024-11-13 17:01:48,814 - __main__ - DEBUG - Sending ----> etc. +2024-11-13 17:01:49,622 - __main__ - DEBUG - Sending ----> Let me know if +2024-11-13 17:01:50,432 - __main__ - DEBUG - Sending ----> you'd like that +2024-11-13 17:01:50,651 - __main__ - DEBUG - Sending ----> information +2024-11-13 17:01:50,839 - __main__ - DEBUG - Sending ----> . +2024-11-13 17:01:51,851 - __main__ - DEBUG - Sending ----> + +Let me know if +2024-11-13 17:01:53,069 - __main__ - DEBUG - Sending ----> you have any other questions or +2024-11-13 17:01:53,271 - __main__ - DEBUG - Sending ----> if +2024-11-13 17:01:55,298 - __main__ - DEBUG - Sending ----> there's anything else I can help you with. +2024-11-13 17:01:56,543 - __main__ - DEBUG - Sending ----> + +Here are 5 more +2024-11-13 17:01:57,146 - __main__ - DEBUG - Sending ----> examples of colors +2024-11-13 17:01:57,949 - __main__ - DEBUG - Received Stop Event At TTS +2024-11-13 17:01:57,949 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 17:11:29,744 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-13 17:11:34,758 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 17:11:34,765 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-13 17:11:34,767 - __main__ - INFO - STT process initialized with PID: 9692 +2024-11-13 17:11:34,767 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Llama-3.2-3B-Instruct-Q4_0_4_4.gguf +2024-11-13 17:11:44,778 - __main__ - INFO - LLM process initialized with PID: 9728 +2024-11-13 17:11:47,775 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 17:11:52,281 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 42 +2024-11-13 17:11:52,283 - __main__ - DEBUG - STT response: STTResponse(text=' Give me 5 examples of colors\n') +2024-11-13 17:11:52,284 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 17:11:52,285 - __main__ - DEBUG - TTS Thread Started +2024-11-13 17:11:52,287 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 17:11:52,351 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 17:11:52,353 - __main__ - DEBUG - LLM response: +2024-11-13 17:11:52,393 - __main__ - DEBUG - PIPER PID, 9901 +2024-11-13 17:11:54,178 - __main__ - INFO - TTFS: 6.445502042770386 +2024-11-13 17:11:54,559 - __main__ - DEBUG - Sending ----> 1. +2024-11-13 17:11:55,508 - __main__ - DEBUG - Sending ----> Red +2. +2024-11-13 17:11:56,447 - __main__ - DEBUG - Sending ----> Blue +3. +2024-11-13 17:11:57,295 - __main__ - DEBUG - Sending ----> Green +4. +2024-11-13 17:11:58,214 - __main__ - DEBUG - Sending ----> Yellow +5. +2024-11-13 17:12:00,611 - __main__ - DEBUG - Sending ----> Purple + +Here are 5 examples of colors +2024-11-13 17:12:00,814 - __main__ - DEBUG - Sending ----> : + + +2024-11-13 17:12:01,343 - __main__ - DEBUG - Sending ----> 1. +2024-11-13 17:12:02,340 - __main__ - DEBUG - Sending ----> Red +2. +2024-11-13 17:12:03,231 - __main__ - DEBUG - Sending ----> Blue +3. +2024-11-13 17:12:04,225 - __main__ - DEBUG - Sending ----> Green +4. +2024-11-13 17:12:05,043 - __main__ - DEBUG - Sending ----> Yellow +5. +2024-11-13 17:12:07,347 - __main__ - DEBUG - Sending ----> Purple + +The text is written in **bold text**, +2024-11-13 17:12:07,776 - __main__ - DEBUG - Sending ----> indicating that +2024-11-13 17:12:08,601 - __main__ - DEBUG - Sending ----> the examples of colors +2024-11-13 17:12:09,874 - __main__ - DEBUG - Sending ----> are in **bold**. +2024-11-13 17:12:11,158 - __main__ - DEBUG - Sending ----> Here is the revised text: + + + +2024-11-13 17:12:13,072 - __main__ - DEBUG - Sending ----> +1. +2024-11-13 17:12:14,966 - __main__ - DEBUG - Sending ----> Red +2. +2024-11-13 17:12:16,841 - __main__ - DEBUG - Sending ----> Blue +3. +2024-11-13 17:12:18,755 - __main__ - DEBUG - Sending ----> Green +4. +2024-11-13 17:12:20,643 - __main__ - DEBUG - Sending ----> Yellow +5. +2024-11-13 17:12:22,311 - __main__ - DEBUG - Decode Thread Stopped. +2024-11-13 17:12:22,311 - __main__ - DEBUG - Received Stop Event At TTS +2024-11-13 17:12:30,693 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-13 17:12:34,397 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /inference HTTP/11" 200 42 +2024-11-13 17:12:34,398 - __main__ - DEBUG - STT response: STTResponse(text=' Give me 5 examples of colors\n') +2024-11-13 17:12:34,399 - __main__ - DEBUG - Decode Thread Started. +2024-11-13 17:12:34,400 - __main__ - DEBUG - TTS Thread Started +2024-11-13 17:12:34,402 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8081 +2024-11-13 17:12:34,407 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8081 "POST /completion HTTP/11" 200 None +2024-11-13 17:12:34,408 - __main__ - DEBUG - LLM response: +2024-11-13 17:12:34,412 - __main__ - DEBUG - PIPER PID, 10312 +2024-11-13 17:12:35,239 - __main__ - INFO - TTFS: 4.553896427154541 +2024-11-13 17:12:36,490 - __main__ - DEBUG - Sending ----> Here are 5 examples of colors +2024-11-13 17:12:36,692 - __main__ - DEBUG - Sending ----> : + +2024-11-13 17:12:37,099 - __main__ - DEBUG - Sending ----> 1. +2024-11-13 17:12:37,950 - __main__ - DEBUG - Sending ----> Blue +2. +2024-11-13 17:12:38,772 - __main__ - DEBUG - Sending ----> Green +3. +2024-11-13 17:12:39,621 - __main__ - DEBUG - Sending ----> Yellow +4. +2024-11-13 17:12:40,442 - __main__ - DEBUG - Sending ----> Red +5. +2024-11-13 17:12:41,690 - __main__ - DEBUG - Sending ----> Purple + +Let me know if +2024-11-26 16:50:29,837 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-26 16:50:34,876 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-26 16:50:34,893 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-26 16:50:34,896 - __main__ - INFO - STT process initialized with PID: 9648 +2024-11-26 16:50:34,896 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Qwen2.5-3.1B-Q4_0_4_4.gguf +2024-11-26 16:50:44,908 - __main__ - INFO - LLM process initialized with PID: 9660 +2024-11-26 16:50:47,295 - __main__ - INFO - Process 9648 terminated gracefully. +2024-11-26 16:50:47,706 - __main__ - INFO - Process 9660 terminated gracefully. +2024-11-26 17:04:19,477 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-26 17:04:24,525 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-26 17:04:24,545 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-26 17:04:24,549 - __main__ - INFO - STT process initialized with PID: 11898 +2024-11-26 17:04:24,550 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Qwen2.5-3.1B-Q4_0_4_4.gguf +2024-11-26 17:04:34,563 - __main__ - INFO - LLM process initialized with PID: 11983 +2024-11-26 17:25:45,391 - __main__ - INFO - Process 11898 terminated gracefully. +2024-11-26 17:25:45,598 - __main__ - INFO - Process 11983 terminated gracefully. +2024-11-26 17:25:49,938 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/whisper.cpp/server -t 4 -p 1 -ng -fa --port 8080 -m /data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin +2024-11-26 17:25:54,983 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): 127.0.0.1:8080 +2024-11-26 17:25:54,998 - urllib3.connectionpool - DEBUG - http://127.0.0.1:8080 "POST /load HTTP/11" 200 43 +2024-11-26 17:25:55,001 - __main__ - INFO - STT process initialized with PID: 13518 +2024-11-26 17:25:55,002 - __main__ - INFO - Initializing environment with command: /data/data/com.termux/files/home/llama.cpp/llama-server -t 2 -b 8192 -ub 512 -n 128 -c 2048 -fa --port 8081 -m /data/data/com.termux/files/home/models/Qwen2.5-3.1B-Q4_0_4_4.gguf +2024-11-26 17:26:05,016 - __main__ - INFO - LLM process initialized with PID: 13573 +2024-11-26 17:28:59,360 - __main__ - INFO - Process 13518 terminated gracefully. +2024-11-26 17:28:59,468 - __main__ - INFO - Process 13573 terminated gracefully. diff --git a/examples/experimentals/voice_engine/main.py b/examples/experimentals/voice_engine/main.py index 1861cf9..208781b 100644 --- a/examples/experimentals/voice_engine/main.py +++ b/examples/experimentals/voice_engine/main.py @@ -13,9 +13,12 @@ import requests import logging import time +import os +import re +import psutil LOGGER = None -DEFAULT_CONFIG = "recipe/default.yaml" +DEFAULT_CONFIG = "nyuntam/examples/experimentals/voice_engine/recipe/rpi5.yaml" def set_logger(*args, **kwargs): @@ -158,10 +161,22 @@ def get_options(self): return cmd +@dataclass +class TTSEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + voice: bool = field(default=False) + model: str = field(default="en_US-lessac-medium") + length_scale: float = field(default=1.5) + + def get_options(self): + SPACE = " " + cmd = f"" + + @dataclass class EngineEnvironmentConfig(EnvironmentConfigMeta): stt: STTEnvironmentConfig = field(default_factory=STTEnvironmentConfig) llm: LLMEnvironmentConfig = field(default_factory=LLMEnvironmentConfig) + tts: TTSEnvironmentConfig = field(default_factory=TTSEnvironmentConfig) log_path: tp.Union[str, Path] = field(default="environment.log") def __post_init__(self): @@ -200,6 +215,11 @@ def parse_args(): ################################################## +class EnvironmentTypes(StrEnum): + STT = "stt" + LLM = "llm" + + @dataclass class STTInput: environment_config: STTEnvironmentConfig @@ -347,7 +367,7 @@ def init_handlers(self) -> None: # NOTE: When using chain of responsibility, initialize handlers here - def call(self, input: STTInput) -> EngineResponse: + def call(self, input: STTInput, ttsConfig) -> EngineResponse: assert isinstance(input, STTInput), "Input must be of type STTInput" tick = time.time() stt_response = STTResponse.from_response(call_stt_environment(input)) @@ -356,14 +376,30 @@ def call(self, input: STTInput) -> EngineResponse: llm_input = LLMInput.from_stt_response(self.config.llm, stt_response) if llm_input.stream: # implement stream response handling + tts_processing_queue = queue.Queue() decoded_streams = queue.Queue() stream_queue = queue.Queue() stop_event = threading.Event() + decode_thread = threading.Thread( target=decode_stream, - args=(stop_event, stream_queue, decoded_streams, True), + args=( + stop_event, + stream_queue, + decoded_streams, + tts_processing_queue, + True, + ), ) decode_thread.start() + + if ttsConfig.voice: + tts_processing_thread = threading.Thread( + target=create_tts_wav, + args=(stop_event, tts_processing_queue, ttsConfig), + ) + tts_processing_thread.start() + llm_input.data["stream"] = True ttfs = None response = call_llm_environment(llm_input) @@ -374,12 +410,16 @@ def call(self, input: STTInput) -> EngineResponse: if line: if ttfs is None: ttfs = time.time() - print("TTFS: ", ttfs - tick) + LOGGER.info(f"TTFS: {ttfs - tick}") stream_queue.put(line) tock = time.time() stop_event.set() + decode_thread.join() + if ttsConfig.voice: + tts_processing_thread.join() + llm_response = LLMResponse( text=decoded_streams_to_text(list(decoded_streams.queue)), streams=list(decoded_streams.queue), @@ -436,6 +476,7 @@ def initialize_environment(config: EnvironmentConfig): + config.get_options().split() + config.get_model_option().split() ) + LOGGER.info(f"Initializing environment with command: {' '.join(cmd)}") return subprocess.Popen(cmd) @@ -473,7 +514,9 @@ def decode_stream( stop_event: threading.Event, stream_queue: queue.Queue, decoded_streams: queue.Queue, + tts_processing_queue: queue.Queue, decode_and_print: bool = False, + decode_and_talk: bool = True, ): while not stop_event.is_set() or not stream_queue.empty(): try: @@ -483,21 +526,163 @@ def decode_stream( if line: json_response = json.loads(line.decode("utf-8").replace("data: ", "")) decoded_streams.put(json_response) - # if decode_and_print: - # # print decoded stream continously with flush - # print(json_response["content"], end="", flush=True) + if decode_and_talk: + tts_processing_queue.put(json_response["content"]) + except queue.Empty: pass # No data to process yet, continue +def create_tts_wav( + stop_event: threading.Event, + tts_processing_queue: queue.Queue, + ttsConfig, + # output_dir: str = "/home/piuser/voice/core/test-output", +): + + piper_process = subprocess.Popen( + [ + "piper", + "--model", + f"{ttsConfig.model}", + "--length-scale", + f"{ttsConfig.length_scale}", + "--output_raw", + ], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + universal_newlines=True, + ) + + piper_proc = psutil.Process(piper_process.pid) + + # Define FFmpeg command to stream the audio over HTTP + ffmpeg_command = [ + "ffmpeg", + "-f", + "s16le", # Input format (16-bit PCM, little-endian) + "-ar", + "22050", # Sample rate + "-ac", + "1", # Number of audio channels (mono) + "-i", + "-", # Input from stdin (output from Piper) + "-acodec", + "aac", # Audio codec (AAC) + "-ab", + "128k", # Audio bitrate + "-f", + "adts", # Output format + "-content_type", + "audio/aac", # Content type for the HTTP stream + "-listen", + "1", # Make FFmpeg act as a server + "http://0.0.0.0:8082/feed.aac", # Output URL + "-acodec", + "pcm_s16le", # Audio codec for WAV + # os.path.join(output_dir, "output.wav") # Output WAV file path + ] + + ffmpeg_process = subprocess.Popen( + ffmpeg_command, + stdin=piper_process.stdout, # Take input from Piper process + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + ) + + # Monitor FFmpeg HTTP stream for first byte + def monitor_stream(): + LOGGER.info("🟢 Starting FFMPEG 🟢") + stream_start_time = time.time() + url = "http://0.0.0.0:8082/feed.aac" + while True: + try: + with requests.get(url, stream=True, timeout=1) as response: + if response.status_code == 200: + # Record the time when the first byte is received + first_byte_time = time.time() - stream_start_time + LOGGER.info( + f"🔴 Time to receive first byte of audio: {first_byte_time:.2f} seconds" + ) + break + except requests.exceptions.RequestException as e: + time.sleep(1) + + threading.Thread(target=monitor_stream, daemon=True).start() + + buffer = "" + + try: + while not stop_event.is_set() or not tts_processing_queue.empty(): + if tts_processing_queue.qsize() > 0: + try: + # Get one item from the queue + text_part = tts_processing_queue.get(timeout=0.001) + buffer += text_part + # LOGGER.debug(f"BUFFER : {buffer}") + + # Check if the buffer contains a full sentence + if any( + delimiter in buffer + for delimiter in [".", "!", "?", ":", ";", ","] + ): + # Split the buffer into sentences + sentences = re.split(r"(?<=[.!?])\s+", buffer) + + # Keep the last partial sentence in the buffer + buffer = ( + sentences.pop() + if not re.search(r"[.!?]$", buffer) and len(sentences) > 1 + else "" + ) + + # Join the complete sentences and send to Piper + text = " ".join(sentences) + + # Measure peak memory usage of Piper process + if text: + try: + LOGGER.debug(f"Sending ----> {text}") + # Write the text to Piper's stdin + piper_process.stdin.write(f"{text}\n") + piper_process.stdin.flush() + + except BrokenPipeError: + LOGGER.error( + "BrokenPipeError: Piper process terminated unexpectedly." + ) + break + + except queue.Empty: + if stop_event.is_set(): + break + continue # No data to process yet, continue + + LOGGER.debug("Received Stop Event At TTS") + + except Exception as e: + LOGGER.debug(f"Unable to run TTS engine. Error message : {e}") + + finally: + piper_process.stdin.close() + for process in [piper_process, ffmpeg_process]: + if process and process.poll() is None: + process.terminate() + try: + process.wait(timeout=5) + except subprocess.TimeoutExpired: + process.kill() + + def decoded_streams_to_text(decoded_streams: tp.List[tp.Dict[str, tp.Any]]) -> str: return " ".join([stream["content"] for stream in decoded_streams]) - ################################################## ################################################## + def print_dict(d: dict, indent: int = 0): for k, v in d.items(): if isinstance(v, dict): @@ -528,7 +713,7 @@ def print_dict(d: dict, indent: int = 0): if user_input == "exit": break stt_input = STTInput(environment_config=config.stt, audio_path=user_input) - response = engine.call(stt_input) + response = engine.call(stt_input, config.tts) print_dict( { "latency": response.latency, @@ -539,4 +724,4 @@ def print_dict(d: dict, indent: int = 0): print(f"-" * 50) except Exception as e: engine.terminate() - raise e \ No newline at end of file + raise e diff --git a/examples/experimentals/voice_engine/main_android.py b/examples/experimentals/voice_engine/main_android.py new file mode 100644 index 0000000..db1159a --- /dev/null +++ b/examples/experimentals/voice_engine/main_android.py @@ -0,0 +1,690 @@ +from dataclasses import dataclass, field, asdict, is_dataclass, fields +from argparse import ArgumentParser +from contextlib import contextmanager +from enum import StrEnum +from pathlib import Path +import typing as tp +import subprocess +import queue +import threading +from abc import ABC, abstractmethod +import yaml +import json +import requests +import logging +import time +import os +import re +import psutil + +LOGGER = None +DEFAULT_CONFIG = "nyuntam/examples/experimentals/voice_engine/recipe/rpi5.yaml" + + +def set_logger(*args, **kwargs): + global LOGGER + logging.basicConfig(*args, **kwargs) + LOGGER = logging.getLogger(__name__) + + +################################################## +# Environment Configurations # +################################################## + + +class EnvironmentConfigMeta(ABC): + + @classmethod + def from_dict(cls, data: dict): + kwargs = {} + for field in fields(cls): + if field.name in data: + if is_dataclass(field.type): + kwargs[field.name] = field.type.from_dict(data[field.name]) + else: + kwargs[field.name] = data[field.name] + return cls(**kwargs) + + @classmethod + def from_yaml(cls, path: str): + with open(path, "r") as file: + data = yaml.safe_load(file) + return cls.from_dict(data) + + def to_dict(self): + return asdict(self) + + def to_yaml(self, path: str): + with open(path, "w") as file: + yaml.dump(self.to_dict(), file) + + +@dataclass +class EnvironmentConfig(ABC): + n_threads: int = field(default=4) + n_procs: int = field(default=1) + flash_attn: bool = field(default=True) + gpu: bool = field(default=False) + port: int = field(default=8080) + model: tp.Union[str, Path] = field(default="") + + _executable: tp.Union[str, Path] = field(default="") + _warmup: int = field(default=0) + + @property + def executable_path(self) -> str: + if isinstance(self._executable, Path): + return str(self._executable.absolute().resolve()) + return self._executable + + @property + def model_path(self) -> str: + if isinstance(self.model, Path): + return str(self.model.absolute().resolve()) + return self.model + + def get_model_option(self) -> str: + return f"-m {self.model_path}" + + @abstractmethod + def get_options(self) -> str: + pass + + def __post_init__(self): + assert self.n_threads in [4, 2, 1], "Number of threads must be 1, 2, or 4" + assert self.n_procs in [1, 2, 4], "Number of processes must be 1, 2, or 4" + assert self.port > 0, "Port number must be positive" + + +@dataclass +class STTEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + """Environment configuration for whisper.cpp""" + + port: int = field(default=8080) + _warmup: int = field(default=5) + + def get_options(self): + SPACE = " " + flash_attn = "-fa" + gpu = "-ng" + threads = f"-t {self.n_threads}" + processes = f"-p {self.n_procs}" + port = f"--port {self.port}" + cmd = threads + cmd += SPACE + cmd += processes + cmd += SPACE + if not self.gpu: + cmd += gpu + cmd += SPACE + if self.flash_attn: + cmd += flash_attn + cmd += SPACE + cmd += port + cmd += SPACE + return cmd + + +@dataclass +class LLMEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + """Environment configuration for llama.cpp""" + + batch_size: int = field(default=8192) + ubatch_size: int = field(default=512) + n_predict: int = field(default=-1) + stream: bool = field(default=True) + port: int = field(default=8081) + _warmup: int = field(default=15) + + def get_options(self): + SPACE = " " + flash_attn = "-fa" + threads = f"-t {self.n_threads}" + batch_size = f"-b {self.batch_size}" + ubatch_size = f"-ub {self.ubatch_size}" + port = f"--port {self.port}" + n_predict = f"-n {self.n_predict}" + context_length = f"-c 2048" + + cmd = threads + cmd += SPACE + cmd += batch_size + cmd += SPACE + cmd += ubatch_size + cmd += SPACE + cmd += n_predict + cmd += SPACE + cmd += context_length + cmd += SPACE + if self.flash_attn: + cmd += flash_attn + cmd += SPACE + cmd += port + cmd += SPACE + return cmd + + +@dataclass +class TTSEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + voice: bool = field(default=False) + model: str = field(default="en_US-lessac-medium") + length_scale: float = field(default=1.5) + + def get_options(self): + SPACE = " " + cmd = f"" + + +@dataclass +class EngineEnvironmentConfig(EnvironmentConfigMeta): + stt: STTEnvironmentConfig = field(default_factory=STTEnvironmentConfig) + llm: LLMEnvironmentConfig = field(default_factory=LLMEnvironmentConfig) + tts: TTSEnvironmentConfig = field(default_factory=TTSEnvironmentConfig) + log_path: tp.Union[str, Path] = field(default="environment.log") + + def __post_init__(self): + set_logger( + filename=self.log_path, + level=logging.DEBUG, + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", + ) + + +################################################## +# Argument Parsing Functions # +################################################## + + +def parse_args(): + # TODO: Add more arguments as necessary + parser = ArgumentParser(description="Environment Configuration Parser") + parser.add_argument( + "--config", + type=str, + default=None, + help="Path to the configuration file (.yaml)", + ) + parser.add_argument( + "--log", + type=str, + default="environment.log", + help="Path to the log file", + ) + return parser.parse_args() + + +################################################## +# Engine Class # +################################################## + + +class EnvironmentTypes(StrEnum): + STT = "stt" + LLM = "llm" + + +@dataclass +class STTInput: + environment_config: STTEnvironmentConfig + audio_path: tp.Union[str, Path] + data: tp.Optional[tp.Dict[str, tp.Any]] = None + + @property + def config(self) -> STTEnvironmentConfig: + return self.environment_config + + @property + def audio(self) -> str: + if isinstance(self.audio_path, Path): + return str(self.audio_path.absolute().resolve()) + return self.audio_path + + +@dataclass +class STTResponse: + text: str + + @classmethod + def from_response(cls, response: requests.Response): + if response.status_code != 200: + raise_exception_from_response(response) + return cls(response.json()["text"]) + + +def default_llm_input_data_factory(): + return { + "prompt": "", + "n_predict": -1, + "stream": True, + } + + +@dataclass +class LLMInput: + environment_config: LLMEnvironmentConfig + prompt: str + data: tp.Optional[tp.Dict[str, tp.Any]] = field( + default_factory=default_llm_input_data_factory + ) + + @property + def stream(self) -> bool: + if self.data is not None and "stream" in self.data: + return self.data["stream"] + return self.config.stream + + @stream.setter + def stream(self, value: bool): + if self.data is None: + self.data = {} + self.data["stream"] = value + + @property + def config(self) -> LLMEnvironmentConfig: + return self.environment_config + + @classmethod + def from_stt_response( + cls, environment_config: LLMEnvironmentConfig, stt_response: STTResponse + ): + return cls(environment_config, stt_response.text) + + def get_data(self): + return { + **self.data, + "prompt": self.prompt, + } + + +@dataclass +class LLMResponse: + text: str + streams: tp.List[tp.Dict[str, tp.Any]] = field(default_factory=list) + ttfs: float = 0.0 + stream: bool = False + + +@dataclass +class EngineInput: + stt_input: tp.Optional[STTInput] = None + llm_input: tp.Optional[LLMInput] = None + + +@dataclass +class EngineResponse: + stt_response: tp.Optional[STTResponse] = None + llm_response: tp.Optional[LLMResponse] = None + latency: float = 0.0 + stt_latency: float = 0.0 + + +class Engine: + def __init__(self, config: EngineEnvironmentConfig): + self.config = config + self.init_handlers() + + @property + def stt(self) -> tp.Optional[subprocess.Popen]: + if hasattr(self, "_stt_process"): + return self._stt_process + else: + return None + + @stt.setter + def stt(self, value: subprocess.Popen): + self._stt_process = value + + @property + def llm(self) -> tp.Optional[subprocess.Popen]: + if hasattr(self, "_llm_process"): + return self._llm_process + else: + return None + + @llm.setter + def llm(self, value: subprocess.Popen): + self._llm_process = value + + def init_handlers(self) -> None: + # Initialize stt + if self.stt is not None: + raise ValueError("STT process already initialized") + try: + self.stt = initialize_stt_environment(self.config.stt) + LOGGER.info(f"STT process initialized with PID: {self.stt.pid}") + except Exception as e: + LOGGER.error(f"Failed to initialize STT process: {e}") + raise e + + # Initialize llm + if self.llm is not None: + raise ValueError("LLM process already initialized") + try: + self.llm = initialize_llm_environment(self.config.llm) + if self.llm.poll() is not None: + raise Exception(f"LLM process failed to start: {self.llm.stderr}") + LOGGER.info(f"LLM process initialized with PID: {self.llm.pid}") + except Exception as e: + LOGGER.error(f"Failed to initialize LLM process: {e}") + raise e + + # NOTE: When using chain of responsibility, initialize handlers here + + def call(self, input: STTInput, ttsConfig) -> EngineResponse: + assert isinstance(input, STTInput), "Input must be of type STTInput" + tick = time.time() + stt_response = STTResponse.from_response(call_stt_environment(input)) + stt_latency = time.time() + LOGGER.debug(f"STT response: {stt_response}") + llm_input = LLMInput.from_stt_response(self.config.llm, stt_response) + if llm_input.stream: + # implement stream response handling + tts_processing_queue = queue.Queue() + decoded_streams = queue.Queue() + stream_queue = queue.Queue() + stop_event = threading.Event() + + decode_thread = threading.Thread( + target=decode_stream, + args=( + stop_event, + stream_queue, + decoded_streams, + tts_processing_queue, + True, + ), + ) + decode_thread.start() + + if ttsConfig.voice: + tts_processing_thread = threading.Thread( + target=create_tts_wav, + args=(stop_event, tts_processing_queue, ttsConfig), + ) + tts_processing_thread.start() + + llm_input.data["stream"] = True + ttfs = None + response = call_llm_environment(llm_input) + LOGGER.debug(f"LLM response: {response}") + if not response.ok or response.status_code != 200: + raise_exception_from_response(response) + for line in response.iter_lines(): + if line: + if ttfs is None: + ttfs = time.time() + LOGGER.info(f"TTFS: {ttfs - tick}") + + stream_queue.put(line) + tock = time.time() + stop_event.set() + + decode_thread.join() + if ttsConfig.voice: + tts_processing_thread.join() + + llm_response = LLMResponse( + text=decoded_streams_to_text(list(decoded_streams.queue)), + streams=list(decoded_streams.queue), + ttfs=ttfs - tick, + stream=True, + ) + return EngineResponse( + stt_response=stt_response, + llm_response=llm_response, + latency=tock - tick, + stt_latency=stt_latency - tick, + ) + else: + raise NotImplementedError("Non-streaming response handling not implemented") + # NOTE: When using chain of responsibility, call handlers here + + def terminate(self): + if self.stt is not None: + kill_process(self.stt) + if self.llm is not None: + kill_process(self.llm) + + +################################################## +# Utility Functions # +################################################## + + +@contextmanager +def warmup_environment(warmup_time: int): + yield + time.sleep(warmup_time) + + +def raise_exception_from_response(response: requests.Response): + LOGGER.error( + f"API call failed with status code: {response.status_code}, response: {response.text}" + ) + raise Exception( + f"API call failed with status code: {response.status_code}, response: {response.text}" + ) + + +def kill_process(process: subprocess.Popen): + process.terminate() + process.wait() + LOGGER.info(f"Process {process.pid} terminated gracefully.") + + +# Set CPU affinity for the entire process +def process_affinity(process_id, affinity_cores): + p = psutil.Process(process_id) + p.cpu_affinity(affinity_cores) + + +def initialize_environment(config: EnvironmentConfig): + with warmup_environment(config._warmup): + cmd: tp.List[str] = ( + [config.executable_path] + + config.get_options().split() + + config.get_model_option().split() + ) + + LOGGER.info(f"Initializing environment with command: {' '.join(cmd)}") + return subprocess.Popen(cmd) + + +def initialize_stt_environment(config: STTEnvironmentConfig): + proc = initialize_environment(config=config) + url = f"http://127.0.0.1:{config.port}/load" + data = {"model": config.model_path} + response = requests.post(url, json=data) + if not response.ok or response.status_code != 200: + LOGGER.error(f"Failed to load STT model.") + raise_exception_from_response(response) + return proc + + +initialize_llm_environment: tp.Callable[[LLMEnvironmentConfig], subprocess.Popen] = ( + initialize_environment +) + + +def call_stt_environment(input: STTInput): + url = f"http://127.0.0.1:{input.config.port}/inference" + files = {"file": open(input.audio, "rb")} + data = input.data + return requests.post(url, files=files, data=data) + + +def call_llm_environment(input: LLMInput): + url = f"http://127.0.0.1:{input.config.port}/completion" + data = input.get_data() + return requests.post(url, json=data, stream=input.stream) + + +def decode_stream( + stop_event: threading.Event, + stream_queue: queue.Queue, + decoded_streams: queue.Queue, + tts_processing_queue: queue.Queue, + decode_and_print: bool = False, + decode_and_talk: bool = True, +): + LOGGER.debug("Decode Thread Started.") + while not stop_event.is_set() or not stream_queue.empty(): + try: + line: bytearray = stream_queue.get( + timeout=0.001 + ) # Get stream data from queue + if line: + json_response = json.loads(line.decode("utf-8").replace("data: ", "")) + decoded_streams.put(json_response) + if decode_and_talk: + tts_processing_queue.put(json_response["content"]) + + except queue.Empty: + pass # No data to process yet, continue + LOGGER.debug("Decode Thread Stopped.") + + +def create_tts_wav( + stop_event: threading.Event, + tts_processing_queue: queue.Queue, + ttsConfig, + # output_dir: str = "/home/piuser/voice/core/test-output", +): + LOGGER.debug("TTS Thread Started") + piper_process = subprocess.Popen( + [ + "espeak" + ], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + universal_newlines=True, + ) + + LOGGER.debug(f"PIPER PID, {piper_process.pid}") + + piper_proc = psutil.Process(piper_process.pid) + + buffer = "" + + try: + while not stop_event.is_set() or not tts_processing_queue.empty(): + if tts_processing_queue.qsize() > 0: + try: + # Get one item from the queue + text_part = tts_processing_queue.get(timeout=0.001) + buffer += text_part + # LOGGER.debug(f"BUFFER : {buffer}") + + # Check if the buffer contains a full sentence + if any( + delimiter in buffer + for delimiter in [".", "!", "?", ":", ";", "," , "and", "but", "or", "nor", "for", "yet", "so", # Coordinating conjunctions + "after", "although", "as", "as if", "as long as", "as much as", "as soon as", "as though", + "because", "before", "by the time", "even if", "even though", "if", "if only", + "in case", "in order that", "lest", "once", "only if", "provided that", + "since", "so that", "than", "that", "though", "till", "unless", + "until", "when", "whenever", "where", "whereas", "wherever", "whether", + "while", # Subordinating conjunctions + "both", "either", "neither", "not only", "whether or not" # Correlative conjunctions + ] + ): + # Split the buffer into sentences + sentences = re.split(r"(?<=[.!?])\s+", buffer) + + # Keep the last partial sentence in the buffer + buffer = ( + sentences.pop() + if not re.search(r"[.!?]$", buffer) and len(sentences) > 1 + else "" + ) + + # Join the complete sentences and send to Piper + text = " ".join(sentences) + + # Measure peak memory usage of Piper process + if text: + try: + LOGGER.debug(f"Sending ----> {text}") + # Write the text to Piper's stdin + piper_process.stdin.write(f"{text}\n") + piper_process.stdin.flush() + + except BrokenPipeError: + LOGGER.error( + "BrokenPipeError: Piper process terminated unexpectedly." + ) + break + + except queue.Empty: + if stop_event.is_set(): + break + continue # No data to process yet, continue + + LOGGER.debug("Received Stop Event At TTS") + + except Exception as e: + LOGGER.debug(f"Unable to run TTS engine. Error message : {e}") + + finally: + piper_process.stdin.close() + for process in [piper_process, ffmpeg_process]: + if process and process.poll() is None: + process.terminate() + try: + process.wait(timeout=5) + LOGGER.debug("TTS thread Stopped.") + except subprocess.TimeoutExpired: + process.kill() + + +def decoded_streams_to_text(decoded_streams: tp.List[tp.Dict[str, tp.Any]]) -> str: + return " ".join([stream["content"] for stream in decoded_streams]) + + +################################################## +################################################## + + +def print_dict(d: dict, indent: int = 0): + for k, v in d.items(): + if isinstance(v, dict): + print(" " * indent, f" - {k}:") + print_dict(v, indent + 4) + else: + print(" " * indent, f" - {k}: {v}") + + +if __name__ == "__main__": + args = parse_args() + if args.config: + config = EngineEnvironmentConfig.from_yaml(args.config) + print_dict(config.to_dict()) + + else: + config = EngineEnvironmentConfig() + config.to_yaml("/home/piuser/edge/recipe/default.yaml") + + engine = Engine(config) + + try: + while True: + # input an audio file path from the user + user_input = input("Enter the path to the audio file: ") + if user_input == "": + user_input = "/home/piuser/shwu/audio_samples/5sec/79833.wav" + if user_input == "exit": + break + stt_input = STTInput(environment_config=config.stt, audio_path=user_input) + response = engine.call(stt_input, config.tts) + print_dict( + { + "latency": response.latency, + "ttfs": response.llm_response.ttfs, + "stt_latency": response.stt_latency, + } + ) + print(f"-" * 50) + except Exception as e: + engine.terminate() + raise e diff --git a/examples/experimentals/voice_engine/main_android_continous.py b/examples/experimentals/voice_engine/main_android_continous.py new file mode 100644 index 0000000..78d5cda --- /dev/null +++ b/examples/experimentals/voice_engine/main_android_continous.py @@ -0,0 +1,721 @@ +from dataclasses import dataclass, field, asdict, is_dataclass, fields +from argparse import ArgumentParser +from contextlib import contextmanager +from enum import StrEnum +from pathlib import Path +import typing as tp +import subprocess +import queue +import threading +from abc import ABC, abstractmethod +import yaml +import json +import requests +import logging +import time +import os +import re +import psutil +from receive_audio import receive_audio + +LOGGER = None +DEFAULT_CONFIG = "nyuntam/examples/experimentals/voice_engine/recipe/rpi5.yaml" + + +def set_logger(*args, **kwargs): + global LOGGER + logging.basicConfig(*args, **kwargs) + LOGGER = logging.getLogger(__name__) + + +################################################## +# Environment Configurations # +################################################## + + +class EnvironmentConfigMeta(ABC): + + @classmethod + def from_dict(cls, data: dict): + kwargs = {} + for field in fields(cls): + if field.name in data: + if is_dataclass(field.type): + kwargs[field.name] = field.type.from_dict(data[field.name]) + else: + kwargs[field.name] = data[field.name] + return cls(**kwargs) + + @classmethod + def from_yaml(cls, path: str): + with open(path, "r") as file: + data = yaml.safe_load(file) + return cls.from_dict(data) + + def to_dict(self): + return asdict(self) + + def to_yaml(self, path: str): + with open(path, "w") as file: + yaml.dump(self.to_dict(), file) + + +@dataclass +class EnvironmentConfig(ABC): + n_threads: int = field(default=4) + n_procs: int = field(default=1) + flash_attn: bool = field(default=True) + gpu: bool = field(default=False) + port: int = field(default=8080) + model: tp.Union[str, Path] = field(default="") + + _executable: tp.Union[str, Path] = field(default="") + _warmup: int = field(default=0) + + @property + def executable_path(self) -> str: + if isinstance(self._executable, Path): + return str(self._executable.absolute().resolve()) + return self._executable + + @property + def model_path(self) -> str: + if isinstance(self.model, Path): + return str(self.model.absolute().resolve()) + return self.model + + def get_model_option(self) -> str: + return f"-m {self.model_path}" + + @abstractmethod + def get_options(self) -> str: + pass + + def __post_init__(self): + assert self.n_threads in [4, 2, 1], "Number of threads must be 1, 2, or 4" + assert self.n_procs in [1, 2, 4], "Number of processes must be 1, 2, or 4" + assert self.port > 0, "Port number must be positive" + + +@dataclass +class STTEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + """Environment configuration for whisper.cpp""" + + port: int = field(default=8080) + _warmup: int = field(default=5) + + def get_options(self): + SPACE = " " + flash_attn = "-fa" + gpu = "-ng" + threads = f"-t {self.n_threads}" + processes = f"-p {self.n_procs}" + port = f"--port {self.port}" + cmd = threads + cmd += SPACE + cmd += processes + cmd += SPACE + if not self.gpu: + cmd += gpu + cmd += SPACE + if self.flash_attn: + cmd += flash_attn + cmd += SPACE + cmd += port + cmd += SPACE + return cmd + + +@dataclass +class LLMEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + """Environment configuration for llama.cpp""" + + batch_size: int = field(default=8192) + ubatch_size: int = field(default=512) + n_predict: int = field(default=-1) + stream: bool = field(default=True) + port: int = field(default=8081) + _warmup: int = field(default=15) + + def get_options(self): + SPACE = " " + flash_attn = "-fa" + threads = f"-t {self.n_threads}" + batch_size = f"-b {self.batch_size}" + ubatch_size = f"-ub {self.ubatch_size}" + port = f"--port {self.port}" + n_predict = f"-n {self.n_predict}" + context_length = f"-c 2048" + + cmd = threads + cmd += SPACE + cmd += batch_size + cmd += SPACE + cmd += ubatch_size + cmd += SPACE + cmd += n_predict + cmd += SPACE + cmd += context_length + cmd += SPACE + if self.flash_attn: + cmd += flash_attn + cmd += SPACE + cmd += port + cmd += SPACE + return cmd + + +@dataclass +class TTSEnvironmentConfig(EnvironmentConfig, EnvironmentConfigMeta): + voice: bool = field(default=False) + model: str = field(default="en_US-lessac-medium") + length_scale: float = field(default=1.5) + + def get_options(self): + SPACE = " " + cmd = f"" + + +@dataclass +class EngineEnvironmentConfig(EnvironmentConfigMeta): + stt: STTEnvironmentConfig = field(default_factory=STTEnvironmentConfig) + llm: LLMEnvironmentConfig = field(default_factory=LLMEnvironmentConfig) + tts: TTSEnvironmentConfig = field(default_factory=TTSEnvironmentConfig) + log_path: tp.Union[str, Path] = field(default="environment.log") + + def __post_init__(self): + set_logger( + filename=self.log_path, + level=logging.DEBUG, + format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", + ) + + +################################################## +# Argument Parsing Functions # +################################################## + + +def parse_args(): + # TODO: Add more arguments as necessary + parser = ArgumentParser(description="Environment Configuration Parser") + parser.add_argument( + "--config", + type=str, + default=None, + help="Path to the configuration file (.yaml)", + ) + parser.add_argument( + "--log", + type=str, + default="environment.log", + help="Path to the log file", + ) + return parser.parse_args() + + +################################################## +# Engine Class # +################################################## + + +class EnvironmentTypes(StrEnum): + STT = "stt" + LLM = "llm" + + +@dataclass +class STTInput: + environment_config: STTEnvironmentConfig + audio_path: tp.Union[str, Path] + data: tp.Optional[tp.Dict[str, tp.Any]] = None + + @property + def config(self) -> STTEnvironmentConfig: + return self.environment_config + + @property + def audio(self) -> str: + if isinstance(self.audio_path, Path): + return str(self.audio_path.absolute().resolve()) + return self.audio_path + + +@dataclass +class STTResponse: + text: str + + @classmethod + def from_response(cls, response: requests.Response): + if response.status_code != 200: + raise_exception_from_response(response) + return cls(response.json()["text"]) + + +def default_llm_input_data_factory(): + return { + "prompt": "", + "n_predict": -1, + "stream": True, + } + + +@dataclass +class LLMInput: + environment_config: LLMEnvironmentConfig + prompt: str + data: tp.Optional[tp.Dict[str, tp.Any]] = field( + default_factory=default_llm_input_data_factory + ) + + @property + def stream(self) -> bool: + if self.data is not None and "stream" in self.data: + return self.data["stream"] + return self.config.stream + + @stream.setter + def stream(self, value: bool): + if self.data is None: + self.data = {} + self.data["stream"] = value + + @property + def config(self) -> LLMEnvironmentConfig: + return self.environment_config + + @classmethod + def from_stt_response( + cls, environment_config: LLMEnvironmentConfig, stt_response: STTResponse + ): + return cls(environment_config, stt_response.text) + + def get_data(self): + self.prompt_qwen = f"<|im_start|>system: You are Qwen, a smart and intelligent smart assistant who can give clear and crisp answer to user. You do not hallucinate at all <|im_end|> <|im_start|>user: {self.prompt} <|im_end|> <|im_start|>assistant " + print(self.prompt_qwen) + return { + **self.data, + "prompt": self.prompt_qwen, + } + + +@dataclass +class LLMResponse: + text: str + streams: tp.List[tp.Dict[str, tp.Any]] = field(default_factory=list) + ttfs: float = 0.0 + stream: bool = False + + +@dataclass +class EngineInput: + stt_input: tp.Optional[STTInput] = None + llm_input: tp.Optional[LLMInput] = None + + +@dataclass +class EngineResponse: + stt_response: tp.Optional[STTResponse] = None + llm_response: tp.Optional[LLMResponse] = None + latency: float = 0.0 + stt_latency: float = 0.0 + + +class Engine: + def __init__(self, config: EngineEnvironmentConfig): + self.config = config + self.init_handlers() + + @property + def stt(self) -> tp.Optional[subprocess.Popen]: + if hasattr(self, "_stt_process"): + return self._stt_process + else: + return None + + @stt.setter + def stt(self, value: subprocess.Popen): + self._stt_process = value + + @property + def llm(self) -> tp.Optional[subprocess.Popen]: + if hasattr(self, "_llm_process"): + return self._llm_process + else: + return None + + @llm.setter + def llm(self, value: subprocess.Popen): + self._llm_process = value + + def init_handlers(self) -> None: + # Initialize stt + if self.stt is not None: + raise ValueError("STT process already initialized") + try: + self.stt = initialize_stt_environment(self.config.stt) + LOGGER.info(f"STT process initialized with PID: {self.stt.pid}") + except Exception as e: + LOGGER.error(f"Failed to initialize STT process: {e}") + raise e + + # Initialize llm + if self.llm is not None: + raise ValueError("LLM process already initialized") + try: + self.llm = initialize_llm_environment(self.config.llm) + if self.llm.poll() is not None: + raise Exception(f"LLM process failed to start: {self.llm.stderr}") + LOGGER.info(f"LLM process initialized with PID: {self.llm.pid}") + except Exception as e: + LOGGER.error(f"Failed to initialize LLM process: {e}") + raise e + + # NOTE: When using chain of responsibility, initialize handlers here + + def call(self, input: STTInput, ttsConfig) -> EngineResponse: + assert isinstance(input, STTInput), "Input must be of type STTInput" + tick = time.time() + stt_response = STTResponse.from_response(call_stt_environment(input)) + stt_latency = time.time() + LOGGER.debug(f"STT response: {stt_response}") + if (stt_response.text == '{"text": " "}' ) or (stt_response.text == '{"text": " "}') or (stt_response.text == '{"text": " "}' ) or (stt_response.text is None ) : + LOGGER.debug("Could not find any STT output for LLM") + return EngineResponse( + stt_response=stt_response, + ) + llm_input = LLMInput.from_stt_response(self.config.llm, stt_response) + if llm_input.stream: + # implement stream response handling + tts_processing_queue = queue.Queue() + decoded_streams = queue.Queue() + stream_queue = queue.Queue() + stop_event = threading.Event() + + decode_thread = threading.Thread( + target=decode_stream, + args=( + stop_event, + stream_queue, + decoded_streams, + tts_processing_queue, + True, + ), + ) + decode_thread.start() + + if ttsConfig.voice: + tts_processing_thread = threading.Thread( + target=create_tts_wav, + args=(stop_event, tts_processing_queue, ttsConfig), + ) + tts_processing_thread.start() + + llm_input.data["stream"] = True + ttfs = None + response = call_llm_environment(llm_input) + LOGGER.debug(f"LLM response: {response}") + if not response.ok or response.status_code != 200: + raise_exception_from_response(response) + for line in response.iter_lines(): + if line: + if ttfs is None: + ttfs = time.time() + LOGGER.info(f"TTFS: {ttfs - tick}") + + stream_queue.put(line) + tock = time.time() + stop_event.set() + + decode_thread.join() + if ttsConfig.voice: + tts_processing_thread.join() + + llm_response = LLMResponse( + text=decoded_streams_to_text(list(decoded_streams.queue)), + streams=list(decoded_streams.queue), + ttfs=ttfs - tick, + stream=True, + ) + return EngineResponse( + stt_response=stt_response, + llm_response=llm_response, + latency=tock - tick, + stt_latency=stt_latency - tick, + ) + else: + raise NotImplementedError("Non-streaming response handling not implemented") + # NOTE: When using chain of responsibility, call handlers here + + def terminate(self): + if self.stt is not None: + kill_process(self.stt) + if self.llm is not None: + kill_process(self.llm) + + +################################################## +# Utility Functions # +################################################## + + +@contextmanager +def warmup_environment(warmup_time: int): + yield + time.sleep(warmup_time) + + +def raise_exception_from_response(response: requests.Response): + LOGGER.error( + f"API call failed with status code: {response.status_code}, response: {response.text}" + ) + raise Exception( + f"API call failed with status code: {response.status_code}, response: {response.text}" + ) + + +def kill_process(process: subprocess.Popen): + process.terminate() + process.wait() + LOGGER.info(f"Process {process.pid} terminated gracefully.") + + +# Set CPU affinity for the entire process +def process_affinity(process_id, affinity_cores): + p = psutil.Process(process_id) + p.cpu_affinity(affinity_cores) + + +def initialize_environment(config: EnvironmentConfig): + with warmup_environment(config._warmup): + cmd: tp.List[str] = ( + [config.executable_path] + + config.get_options().split() + + config.get_model_option().split() + ) + + LOGGER.info(f"Initializing environment with command: {' '.join(cmd)}") + return subprocess.Popen(cmd) + + +def initialize_stt_environment(config: STTEnvironmentConfig): + proc = initialize_environment(config=config) + url = f"http://127.0.0.1:{config.port}/load" + data = {"model": config.model_path} + response = requests.post(url, json=data) + if not response.ok or response.status_code != 200: + LOGGER.error(f"Failed to load STT model.") + raise_exception_from_response(response) + return proc + + +initialize_llm_environment: tp.Callable[[LLMEnvironmentConfig], subprocess.Popen] = ( + initialize_environment +) + + +def call_stt_environment(input: STTInput): + url = f"http://127.0.0.1:{input.config.port}/inference" + files = {"file": open(input.audio, "rb")} + data = input.data + return requests.post(url, files=files, data=data) + + +def call_llm_environment(input: LLMInput): + url = f"http://127.0.0.1:{input.config.port}/completion" + data = input.get_data() + return requests.post(url, json=data, stream=input.stream) + + +def decode_stream( + stop_event: threading.Event, + stream_queue: queue.Queue, + decoded_streams: queue.Queue, + tts_processing_queue: queue.Queue, + decode_and_print: bool = False, + decode_and_talk: bool = True, +): + LOGGER.debug("Decode Thread Started.") + while not stop_event.is_set() or not stream_queue.empty(): + try: + line: bytearray = stream_queue.get( + timeout=0.001 + ) # Get stream data from queue + if line: + json_response = json.loads(line.decode("utf-8").replace("data: ", "")) + decoded_streams.put(json_response) + if decode_and_talk: + tts_processing_queue.put(json_response["content"]) + + except queue.Empty: + pass # No data to process yet, continue + LOGGER.debug("Decode Thread Stopped.") + + +def create_tts_wav( + stop_event: threading.Event, + tts_processing_queue: queue.Queue, + ttsConfig, + # output_dir: str = "/home/piuser/voice/core/test-output", +): + LOGGER.debug("TTS Thread Started") + piper_process = subprocess.Popen( + [ + "espeak" + ], + stdin=subprocess.PIPE, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + universal_newlines=True, + ) + + LOGGER.debug(f"PIPER PID, {piper_process.pid}") + + piper_proc = psutil.Process(piper_process.pid) + + buffer = "" + + try: + while not stop_event.is_set() or not tts_processing_queue.empty(): + if tts_processing_queue.qsize() > 0: + try: + # Get one item from the queue + text_part = tts_processing_queue.get(timeout=0.001) + buffer += text_part + # LOGGER.debug(f"BUFFER : {buffer}") + + # Check if the buffer contains a full sentence + if any( + delimiter in buffer + for delimiter in [".", "!", "?", ":", ";", "," , "and", "but", "or", "nor", "for", "yet", "so", # Coordinating conjunctions + ] + ): + # Split the buffer into sentences + sentences = re.split(r"(?<=[.!?])\s+", buffer) + + # Keep the last partial sentence in the buffer + buffer = ( + sentences.pop() + if not re.search(r"[.!?]$", buffer) and len(sentences) > 1 + else "" + ) + + # Join the complete sentences and send to Piper + text = " ".join(sentences) + + # Measure peak memory usage of Piper process + if text: + try: + LOGGER.debug(f"Sending ----> {text}") + # Write the text to Piper's stdin + piper_process.stdin.write(f"{text}\n") + piper_process.stdin.flush() + + except BrokenPipeError: + LOGGER.error( + "BrokenPipeError: Piper process terminated unexpectedly." + ) + break + + except queue.Empty: + if stop_event.is_set(): + break + continue # No data to process yet, continue + + LOGGER.debug("Received Stop Event At TTS") + + except Exception as e: + LOGGER.debug(f"Unable to run TTS engine. Error message : {e}") + + finally: + piper_process.stdin.close() + for process in [piper_process, ffmpeg_process]: + if process and process.poll() is None: + process.terminate() + try: + process.wait(timeout=5) + LOGGER.debug("TTS thread Stopped.") + except subprocess.TimeoutExpired: + process.kill() + + +def decoded_streams_to_text(decoded_streams: tp.List[tp.Dict[str, tp.Any]]) -> str: + return " ".join([stream["content"] for stream in decoded_streams]) + + +################################################## +################################################## + + +def print_dict(d: dict, indent: int = 0): + for k, v in d.items(): + if isinstance(v, dict): + print(" " * indent, f" - {k}:") + print_dict(v, indent + 4) + else: + print(" " * indent, f" - {k}: {v}") + + + +import os +import time +from pathlib import Path + +def wait_for_audio_file(directory): + """ + Continuously watch for an audio file in the specified directory. + Returns the path to the audio file once found. + """ + print(f"Watching directory: {directory}") + while True: + files = [f for f in os.listdir(directory) if f.endswith(".wav")] + if files: + # Assuming you want to process the first found file + file_path = os.path.join(directory, files[0]) + print(f"Found audio file: {file_path}") + return file_path + time.sleep(0.5) # Wait for 1 second before checking again + + +if __name__ == "__main__": + args = parse_args() + if args.config: + config = EngineEnvironmentConfig.from_yaml(args.config) + print_dict(config.to_dict()) + else: + config = EngineEnvironmentConfig() + config.to_yaml("/home/piuser/edge/recipe/default.yaml") + + engine = Engine(config) + + try: + while True: + # Directory to watch for audio files + audio_file_dir = "./received_audio.wav" + receive_audio(audio_file_dir) + # Continuously search for an audio file + #user_input = wait_for_audio_file(audio_file_dir) + user_input = "./received_audio.wav" + + # Execute the processing once the audio file is found + stt_input = STTInput(environment_config=config.stt, audio_path=user_input) + response = engine.call(stt_input, config.tts) + + + # print_dict( + # { + # "latency": response.latency, + # "ttfs": response.llm_response.ttfs, + # "stt_latency": response.stt_latency, + # } + # ) + print(f"-" * 50) + try: + os.remove(user_input) + print(f"Deleted processed file: {user_input}") + except OSError as e: + print(f"Error deleting file: {user_input}, {e}") + + except Exception as e: + engine.terminate() + raise \ No newline at end of file diff --git a/examples/experimentals/voice_engine/receive_audio.py b/examples/experimentals/voice_engine/receive_audio.py new file mode 100644 index 0000000..5086c02 --- /dev/null +++ b/examples/experimentals/voice_engine/receive_audio.py @@ -0,0 +1,134 @@ +import socket +import numpy as np +import pyaudio +import wave +import time + +def receive_audio(path='./received_audio.wav', + HOST='192.168.1.24', # Pico W's IP address + PORT=5000, + SAMPLE_RATE=16000, + CHANNELS=1, + FORMAT=pyaudio.paInt16, + CHUNK_SIZE=1600, + GRACE_PERIOD=5): # Grace period in seconds + """ + Receives audio data from the Pico W over TCP and saves it to a WAV file. + Initially blocks to wait for data, then becomes non-blocking for termination. + """ + # Each sample is 2 bytes (16 bits) + BYTES_PER_SAMPLE = 2 # FIXED: Changed from 1 to 2 for 16-bit audio + TOTAL_SAMPLES = SAMPLE_RATE * CHANNELS * 5 # 5 seconds of audio + TOTAL_BYTES = TOTAL_SAMPLES * BYTES_PER_SAMPLE + + # Initialize PyAudio + p = pyaudio.PyAudio() + + # Create a stream to play audio + stream = p.open(format=FORMAT, + channels=CHANNELS, + rate=SAMPLE_RATE, + output=True, + frames_per_buffer=CHUNK_SIZE) # Added frames_per_buffer + + frames = [] # List to store audio frames + received_bytes = 0 # Counter for total bytes received + first_byte_received = False + last_data_time = time.time() # Tracks time of last received data + + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: + print(f"Connecting to {HOST}:{PORT}...") + try: + s.connect((HOST, PORT)) + print("Connected to Pico W.") + except Exception as e: + print(f"Failed to connect: {e}") + return + + try: + data_buffer = b'' + s.setblocking(True) # Start with blocking mode + print("Waiting for first byte...") + + while True: + try: + # Receive data + data = s.recv(CHUNK_SIZE * BYTES_PER_SAMPLE) # Adjusted receive size + if data: + if not first_byte_received: + first_byte_received = True + print("First byte received, switching to non-blocking mode.") + s.setblocking(False) # Switch to non-blocking mode + + received_bytes += len(data) + last_data_time = time.time() # Reset the timeout timer + data_buffer += data + + # Process data in CHUNK_SIZE increments + while len(data_buffer) >= CHUNK_SIZE * BYTES_PER_SAMPLE: # Adjusted chunk check + chunk = data_buffer[:CHUNK_SIZE * BYTES_PER_SAMPLE] + data_buffer = data_buffer[CHUNK_SIZE * BYTES_PER_SAMPLE:] + + # Convert bytes to NumPy array + audio_data = np.frombuffer(chunk, dtype=np.int16) + + # Remove DC offset (optional) + # dc_offset = np.mean(audio_data) + # audio_data = audio_data - int(dc_offset) + + # # Apply gain to amplify the audio + # gain_factor = 2.0 + # audio_data = audio_data * gain_factor + + # Ensure we don't exceed the int16 range + audio_data = np.clip(audio_data, -32768, 32767).astype(np.int16) + + # Convert back to bytes + processed_data = audio_data.tobytes() + + # Write data to audio stream + stream.write(processed_data) + + # Append data to frames list + frames.append(processed_data) + + # Check if we have received enough data + if received_bytes >= TOTAL_BYTES: + print("Received enough audio data.") + break + + else: + # Non-blocking termination if no data is received + if time.time() - last_data_time > GRACE_PERIOD: + print("Grace period exceeded, terminating.") + break + + except BlockingIOError: + # Non-blocking mode will raise this if no data is available + if time.time() - last_data_time > GRACE_PERIOD: + print("No more data available during grace period, terminating.") + break + + finally: + print("Saving audio...") + if frames: + save_segment(frames, path, p, CHANNELS, FORMAT, SAMPLE_RATE) + else: + print("No frames captured.") + stream.stop_stream() + stream.close() + p.terminate() + print("Connection closed.") + +def save_segment(frames, path, p, CHANNELS, FORMAT, SAMPLE_RATE): + wf = wave.open(path, 'wb') + wf.setnchannels(CHANNELS) + wf.setsampwidth(p.get_sample_size(FORMAT)) + wf.setframerate(SAMPLE_RATE) + wf.writeframes(b''.join(frames)) + wf.close() + print(f"Audio segment saved to {path}") + +# Example usage +if __name__ == '__main__': + receive_audio() \ No newline at end of file diff --git a/examples/experimentals/voice_engine/recipe/recipe_android.yaml b/examples/experimentals/voice_engine/recipe/recipe_android.yaml new file mode 100644 index 0000000..689477c --- /dev/null +++ b/examples/experimentals/voice_engine/recipe/recipe_android.yaml @@ -0,0 +1,27 @@ +llm: + _executable: '/data/data/com.termux/files/home/llama.cpp/llama-server' + _warmup: 10 + batch_size: 8192 + flash_attn: true + gpu: false + model: '/data/data/com.termux/files/home/models/Qwen2.5-3.1B-Q4_0_4_4.gguf' + n_predict: 128 + n_procs: 1 + n_threads: 2 + port: 8081 + stream: true + ubatch_size: 512 +stt: + _executable: '/data/data/com.termux/files/home/whisper.cpp/server' + _warmup: 5 + flash_attn: true + gpu: false + model: '/data/data/com.termux/files/home/models/ggml-tiny-q4_0.bin' + n_procs: 1 + n_threads: 4 + port: 8080 +tts: + voice: true + model: "en_US-lessac-medium" + length_scale: 1.5 +log_path: environment.log diff --git a/examples/experimentals/voice_engine/recipe/rpi5.yaml b/examples/experimentals/voice_engine/recipe/rpi5.yaml index 4a063a1..6f8c473 100644 --- a/examples/experimentals/voice_engine/recipe/rpi5.yaml +++ b/examples/experimentals/voice_engine/recipe/rpi5.yaml @@ -5,10 +5,10 @@ llm: batch_size: 8192 flash_attn: true gpu: false - model: 'llama.cpp/local_models/llama38B-Model-8.0B-Q4_0_4_4.gguff' # find the model here - https://huggingface.co/AbhrantaNYUN/meta-llama3-8B-Q4_0_4_4/tree/main + model: 'llama38B-Model-8.0B-Q4_0_4_4.gguff' # find the model here - https://huggingface.co/AbhrantaNYUN/meta-llama3-8B-Q4_0_4_4/tree/main n_predict: 128 n_procs: 1 - n_threads: 4 + n_threads: 2 port: 8081 stream: true ubatch_size: 512 @@ -17,8 +17,12 @@ stt: _warmup: 5 flash_attn: true gpu: false - model: 'whisper.cpp/models/ggml-tiny-q4_0.en.bin' + model: 'whisper.cpp/models/local_models/ggml-tiny-q4_0.en.bin' n_procs: 1 n_threads: 4 port: 8080 +tts: + voice: true + model: "en_US-lessac-medium" + length_scale: 1.5 log_path: environment.log \ No newline at end of file diff --git a/examples/experimentals/voice_engine/run_on_android.md b/examples/experimentals/voice_engine/run_on_android.md new file mode 100644 index 0000000..11190ad --- /dev/null +++ b/examples/experimentals/voice_engine/run_on_android.md @@ -0,0 +1,67 @@ +## Steps to setup on android device : + +
    +
  1. Install termux apk : link
  2. + +**NOTE :** Termux might not work on latest version of android, so, it is advisable use android 9 (tested) + +
  3. in termux, run : apt upgrade && apt update
  4. +
  5. install python in termux :
    1. pkg install tur-repo
    2. pkg install python3.11
    +
  6. Install espeak using : pkg install espeak
  7. +
+ +## Setup llama.cpp: +Setup llama.cpp on android using the following commands: +
    +
  1. git clone https://github.com/ggerganov/llama.cpp.git
  2. +
  3. cd llama.cpp
  4. +
  5. apt install git cmake
  6. +
  7. make GGML_NO_LLAMAFILE=1
  8. +
+ +## Setup whisper.cpp: +Setup whisper.cpp on android using the following commands: +
    +
  1. git clone https://github.com/ggerganov/whisper.cpp.git
  2. +
  3. cd whisper.cpp
  4. +
  5. make
  6. +
+ +## Getting the Llama model : + +
    +
  1. Create a folder to store the llama-model -> mkdir llama-model
  2. +
  3. Download the Llama3.2-3B model from : here
  4. +
  5. Move the model into the llama-model folder
  6. +
+ +## Getting the whisper model: +
    +
  1. Create a folder to store the whisper-model -> mkdir whisper-model
  2. +
  3. Download ggml-tiny-fp16.bin
  4. +
  5. Move the model into whisper-model
  6. +
  7. Quantize the model to 4 bit (if necessary) using the following command : whisper.cpp/quantize whisper-model/ggml-tiny-fp16.bin whisper-model/ggml-tiny-q4_0.bin q4_0
  8. +
  9. Delete the fp16 model (if Q4 is being used) to save space
  10. +
+ +## Setup the nyuntam code base : +The code is present in nyunta, so we need to get that. + +
    +
  1. git clone https://github.com/nyunAI/nyuntam.git
  2. +
  3. cd nyuntam
  4. +
  5. The code is currently in the "tts" branch : git checkout origin/tts
  6. +
+ +## Running the code : +
    +
  1. Move into the appropriate folder : nyuntam/examples/experimentals/voice-engine
  2. +
  3. Put the correct executable (llama-server, whisper-server) path for your system in the yaml file present in recipe/recipe_android.yaml . **NOTE :** These servers are present inside llama.cpp and whisper.cpp respectively
  4. +
  5. Put the correct model file path in the recipe yaml file
  6. +
  7. run the main_android.py using the following command : python3.11 main_android.py --config recipe/recipe_android.yaml
  8. +
+ +**NOTE :** If you are running this for the first time, there maybe a few packages that will be missing. These can be installed using `pip3.11 install [package-name]` + + + diff --git a/examples/experimentals/voice_engine/tts-blog.md b/examples/experimentals/voice_engine/tts-blog.md new file mode 100644 index 0000000..a6b4735 --- /dev/null +++ b/examples/experimentals/voice_engine/tts-blog.md @@ -0,0 +1,11 @@ +# The speed evals for text-to-speech engine on Raspberry Pi 5 + +![](../../assets/ttfs_2_vs_4_threads_llm_with_tts.png) + +Above diagram shows Time to First Text Stream coming from the LLM + + + +![](../../assets/ttfs_2_vs_4_threads_llm_with_tts.png) + +Abiove diagram shows Time to First Audio stream coming from the TTS engine. \ No newline at end of file