Speech-dispatcher integration? #2583
Replies: 7 comments 3 replies
-
For the dev team it is out of scope but no say no for a PR |
Beta Was this translation helpful? Give feedback.
-
Okay so following this post to try and get this to read an ebook for me I managed to get it running by first making TTS a system wide available package (guessing the issue you are having is the pyenv maybe?) $ sudo -H python3.9 -m ensurepip
$ sudo -H python3.9 -m pip install TTS Next I created
And added to
And that was enough to get
Edit2: also i do recommend people give piper a try because for a text that takes 5.6 seconds from command run to output file on coqui takes 300ms on piper so for this use case i found it much preferable tbh...
|
Beta Was this translation helpful? Give feedback.
-
I've managed to cobble together a working solution I'm happy with. @jukefr I also had the frustration of spd-say working the first time and then inexplicably hanging thereafter. I discovered that the strange behaviour only occurs when something is printed to stdout insided the GenericExecuteSynth "FILE=\"/tmp/$(date +'%Y.%m.%d_%H.%M.%S.%N').wav\"; { tts --text '$DATA' --out_path $FILE; $PLAY_COMMAND $FILE; rm $FILE; } >&2" I'm on Ubuntu MATE and found that pulseaudio didn't play nicely with the system level service, (something about pipewire exposing pulse interfaces?!) so I used spd-conf to setup a user level service. I'm using tts-server with a custom trained voice. My laptop doesn't have a fancy GPU, so I have the cpu version of tts-server running in the background ready to accept requests and output the inference, without loading the model every time. $ cat /usr/lib/systemd/user/tts-server.service [Unit]
Description=Text-to-Speech Web Server
After=network.target
[Service]
Type=simple
ExecStart=/home/user/.local/bin/tts-server --model_path /home/user/dev/tts/da_checkpoint_2370000.pth --config_path /home/user/dev/tts/config.json
#ExecReload=/bin/kill -HUP $MAINPID
Restart=always
[Install]
WantedBy=multi-user.target $ systemctl --user daemon-reload
$ systemctl --user enable tts-server
$ systemctl --user start tts-server.service
$ systemctl --user status tts-server.service
$ journalctl --user -xeu tts-server.service In AddModule "coqui-generic" "sd_generic" "coqui-generic.conf" In Debug 1
### IMPORTANT! There mustn't be any output on stdout or it will hang! ###
GenericExecuteSynth "export RATE=$RATE; export PITCH=$PITCH; /home/user/dev/tts/dispatch-chain.sh '$DATA' >&2"
GenericCmdDependency "curl"
GenericPortDependency 5002
GenericLanguage "en" "en" "utf-8"
AddVoice "en" "MALE1" "David"
DefaultVoice "David" My #!/bin/bash
set -x
export SAVE_DIR="/home/user/dev/tts/saves"
NEW_FILENAME="$(date +'%Y.%m.%d_%H.%M.%S.%N').mp3"
export NEW_FILEPATH="${SAVE_DIR}/${NEW_FILENAME}"
TEXT="$@"
CURL_TEXT=$(echo "$TEXT" | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g')
export FFMPEG_TEXT=$(echo $TEXT | sed 's/'\''/`/g;s/:/\\:/g' | fold -s -w 25)
# Limit the number of items in the queue
CHECKER="/home/user/dev/tts/check-caller.py"
while true; do [ 3 -gt $(pgrep -f $CHECKER | wc -l) ] && break || sleep 1; done
curl "http://127.0.0.1:5002/api/tts?text=${CURL_TEXT}&speaker_id=&style_wav=" -s --output - |\
ffmpeg -nostats -hide_banner -loglevel error -i - -metadata title="${TEXT}" ${NEW_FILEPATH}
python $CHECKER & I'm calling a "call checker" script in the background. I'm piping the tts-server output to ffmpeg, adding the text as the "title" metadata and saving it as an MP3 in a saves directory. In The queueing system works nicely, but if you decide you want to cancel with If you don't want the video output, just uncomment the commented out line I'm exporting and passing some of the speechd variables into the ffplay pipeline, so we can use the rate and pitch options for spd-say. #!/usr/bin/env python
import os
import psutil
import time
import subprocess
import sys
def run_ffplay():
filepath = os.environ.get("NEW_FILEPATH")
# Ran into issues with this with Orca.. needs some attention
rate = 1.0+(float(os.environ.get("RATE"))/100.0) # I like my audio fast, so set this to "1.80+..."
pitch = 1.0+(float(os.environ.get("PITCH"))/100.0)
ffmpeg_text = os.environ.get("FFMPEG_TEXT")
print(f"Playing '{filepath}'", file=sys.stderr)
ffplay_cmd = "ffplay -autoexit".split()
ffplay_cmd += "-hide_banner -nostats -loglevel error -f lavfi -i".split()
# ffplay_cmd += ["-nodisp",]
ffplay_cmd += [f"amovie={filepath},asetrate=22050*{pitch},atempo={pitch},aresample=22050,atempo=1/{pitch},atempo={rate}, asplit [a][out1];"\
"[a] showspectrumpic=s=400x400:legend=0:orientation=horizontal:saturation=-1:color=fire,"\
"drawtext=fontsize=28:fontcolor=white:fontfile=FreeSans.ttf:expansion=none:text="\
f"'{ffmpeg_text}':x=10:y=10",]
subprocess.Popen(ffplay_cmd)
def wait_for_previous_process():
this = psutil.Process()
cmdline = this.cmdline()
# print(cmdline, file=sys.stderr)
# Check for other processes running with the same filename (excluding ourselves)
processes = [p for p in psutil.process_iter(["pid", "name", "cmdline"]) if
len(p.info["cmdline"]) > 1 and p.info["cmdline"] == cmdline and this.pid != p.pid]
# print(f"Current process PID: {this.pid}\nParent process PID: {this.parent().pid}\n{[p.info for p in processes]}", file=sys.stderr)
# Wait until the processes in front in the queue finish
while any(psutil.pid_exists(p.pid) for p in processes):
time.sleep(1)
# String that confirms it's an ffplay instance we launched
ff_chk_str = f"amovie={os.environ.get('SAVE_DIR')}"
# Check for any running ffplays
ff_processes = [p for p in psutil.process_iter(["pid", "name", "cmdline"]) if
"ffplay" == p.name() and p.cmdline()[-1].startswith(ff_chk_str)]
# We're next in the queue, wait for ffplay to finish and then unlock
while any(psutil.pid_exists(p.pid) for p in ff_processes)):
# print("waiting for ffplay to finish", file=sys.stderr)
time.sleep(0.2)
if __name__ == "__main__":
# Wait for the previous process to finish
wait_for_previous_process()
run_ffplay() You can see the output of everything with Hope that helps others following in the footsteps =) No warranty is provided, your mileage may vary. |
Beta Was this translation helpful? Give feedback.
-
@PeteHemery I'm running speech-dispatcher by hand to test, since spd-say wouldn't start it for some reason:
My module .conf looks like:
I don't see anything striking in the debug logs that would show me why the execution line is not running (or doesn't have access to write to /tmp or my /home/jag/ dir):
I'm sure you had to work with yours quite a bit -- maybe you'll have some insight here (and I might then move to using your setup -- I recently was working with coqui myself. Linux could really use the modern higher quality TTS, imo). |
Beta Was this translation helpful? Give feedback.
-
Thank you :)) I can't get any command in GenericExecuteSynth to do anything -- strace -f doesn't seem to show anything ever gets executed. I've resorted to running sd_generic by hand and feeding it output gathered from the strace. All appears okay, but no child is executed, and simple commands like "touch /tmp/foo" and "echo test >/tmp/blah" (or other locations) produce no files. There's likely just something very simple wrong (assuming there's no weird bug in sd_generic)... I'm probably not at the point where I'll try to build it and go through it myself to debug the issue though. (My debug for pharynx-vpi.conf is being output, but it doesn't seem to show anything striking...)
|
Beta Was this translation helpful? Give feedback.
-
Did anyone figure out how to setup coqui with speech-dispatcher? With pied that setup takes 2 minutes! |
Beta Was this translation helpful? Give feedback.
-
I did get it working thanks ;) the write up above still working well ;P |
Beta Was this translation helpful? Give feedback.
-
Has anyone managed to get a pre-trained TTS model working with
speech-dispatcher
?Context: I'm looking to replace the default TTS module used by speech-dispatcher with
coqui-ai
. Speech dispatcher is used by Firefox Reader View and accessibility software for screen-reading.I cobbled together the speech-dispatcher configuration based on the mimic3 documentation and this thread on termux-tts-speak, but there is no TTS output, nor an error in the log files.
My configuration
coqui-ai
at/etc/speech-dispatcher/modules/coqui-ai.conf
. Note that I use pyenv and a local pyenv for coqui-ai.I've tested the command directly and it works when I replace $DATA with hardcoded text, and $PLAY_COMMAND with
aplay
/etc/speech-dispatcher/speechd.conf
) to usecoqui-ai
as the default module/voice.When I test with
spd-say
, I don't see any errors in the coqui-ai or speech-dispatcher log files (located at/var/log/speech-dispatcher/coqui-ai.log
and/var/log/speech-dispatcher/speech-dispatcher.log
respectively). To ensure that the log files are being used, I updated theLogLevel
to5
in the speech dispatcher configuration and addedDebug 1
incoqui-ai.conf
.The previous discussion in this repo was inconclusive.
Beta Was this translation helpful? Give feedback.
All reactions