Speech-dispatcher integration? #2583

vinaygopinath · 2023-05-04T06:21:57Z

vinaygopinath
May 4, 2023

Has anyone managed to get a pre-trained TTS model working with speech-dispatcher?

Context: I'm looking to replace the default TTS module used by speech-dispatcher with coqui-ai. Speech dispatcher is used by Firefox Reader View and accessibility software for screen-reading.

I cobbled together the speech-dispatcher configuration based on the mimic3 documentation and this thread on termux-tts-speak, but there is no TTS output, nor an error in the log files.

My configuration

New configuration file for coqui-ai at /etc/speech-dispatcher/modules/coqui-ai.conf. Note that I use pyenv and a local pyenv for coqui-ai.

GenericExecuteSynth "pyenv local coqui-ai && tts --text '"$DATA"' --model_name '"tts_models/en/ljspeech/glow-tts"' --out_path /tmp/output.wav; $PLAY_COMMAND /tmp/output.wav"
GenericCmdDependency "pyenv"
AddVoice "en" "FEMALE1" "slt"

I've tested the command directly and it works when I replace $DATA with hardcoded text, and $PLAY_COMMAND with aplay

Update the speech-dispatcher configuration (/etc/speech-dispatcher/speechd.conf) to use coqui-ai as the default module/voice.

AddModule "coqui-ai" "sd_generic" "coqui-ai.conf"

DefaultVoiceType "FEMALE1"
DefaultModule coqui-ai
DefaultLanguage "en"
AudioOutputMethod "libao" // Need this because I'm on a Debian testing instance that uses pipewire

When I test with spd-say, I don't see any errors in the coqui-ai or speech-dispatcher log files (located at /var/log/speech-dispatcher/coqui-ai.log and /var/log/speech-dispatcher/speech-dispatcher.log respectively). To ensure that the log files are being used, I updated the LogLevel to 5 in the speech dispatcher configuration and added Debug 1 in coqui-ai.conf.

The previous discussion in this repo was inconclusive.

erogol · 2023-05-08T09:06:30Z

erogol
May 8, 2023
Maintainer

For the dev team it is out of scope but no say no for a PR

0 replies

jukefr · 2023-06-06T02:48:42Z

jukefr
Jun 6, 2023

Okay so following this post to try and get this to read an ebook for me I managed to get it running by first making TTS a system wide available package (guessing the issue you are having is the pyenv maybe?)

$ sudo -H python3.9 -m ensurepip
$ sudo -H python3.9 -m pip install TTS

Next I created /etc/speech-dispatcher/modules/coqui-generic.conf with

GenericExecuteSynth "tts --text \'$DATA\' --out_path /tmp/speech.wav && $PLAY_COMMAND /tmp/speech.wav"
AddVoice "en" "FEMALE1" "en_UK/apope_low"

And added to /etc/speech-dispatcher/speechd.conf

DefaultVoiceType  "FEMALE1"
DefaultLanguage "en"
DefaultModule coqui-generic

And that was enough to get spd-say to work

However (and i dont know if this is more general than simply on my install of speech-dispatcher because it does seem buggy, after the 1st invocation of spd-say any subsequent ones just hang forever.

~~So would love to know if this is an issue for everybody or just my setup being cursed.~~

I tried as best as I could to figure out how speech dispatcher actually works and from what I could tell it some sort of absolutely cursed "auto spawn" mechanism when a client tries to talk to it (no services to be seen anywhere) and I could not manage to find why the hanging is happening or even run my own "daemon" with debug and log flags set because they decided to make speech-dispatcher simply auto stop after like 3 seconds of no clients interacting with it so Im at a loss.

~~Edit: "fixed" it by adding a && pkill sd_generic after the paplay command so the line looks like~~

GenericExecuteSynth "tts --text \'$DATA\' --out_path /tmp/speech.wav && $PLAY_COMMAND /tmp/speech.wav && pkill sd_generic"

~~But im almost certain thats not how youre supposed to be doing this lol~~
Edit3: after a reboot i had to remove the && pkill for it to work properly
speech dispatcher is absolutely cursed im just at a loss

Edit2: also i do recommend people give piper a try because for a text that takes 5.6 seconds from command run to output file on coqui takes 300ms on piper so for this use case i found it much preferable tbh...

GenericExecuteSynth "echo \'$DATA\' | /home/user/Documents/piper/piper --model /home/user/Documents/piper/en-us-amy-low.onnx --output_raw | $PLAY_COMMAND"

0 replies

PeteHemery · 2024-03-15T14:45:50Z

PeteHemery
Mar 15, 2024

I've managed to cobble together a working solution I'm happy with.

@jukefr I also had the frustration of spd-say working the first time and then inexplicably hanging thereafter. I discovered that the strange behaviour only occurs when something is printed to stdout insided the GenericExecuteSynth subshell. A workaround is piping stdout to stderr with whatever command you choose to use:

GenericExecuteSynth "FILE=\"/tmp/$(date +'%Y.%m.%d_%H.%M.%S.%N').wav\"; { tts --text '$DATA' --out_path $FILE; $PLAY_COMMAND $FILE; rm $FILE; } >&2"

I'm on Ubuntu MATE and found that pulseaudio didn't play nicely with the system level service, (something about pipewire exposing pulse interfaces?!) so I used spd-conf to setup a user level service.

I'm using tts-server with a custom trained voice. My laptop doesn't have a fancy GPU, so I have the cpu version of tts-server running in the background ready to accept requests and output the inference, without loading the model every time.
I set up a user service for running the tts-server:

$ cat /usr/lib/systemd/user/tts-server.service

[Unit]
Description=Text-to-Speech Web Server
After=network.target

[Service]
Type=simple
ExecStart=/home/user/.local/bin/tts-server --model_path /home/user/dev/tts/da_checkpoint_2370000.pth --config_path /home/user/dev/tts/config.json
#ExecReload=/bin/kill -HUP $MAINPID
Restart=always

[Install]
WantedBy=multi-user.target

$ systemctl --user daemon-reload
$ systemctl --user enable tts-server
$ systemctl --user start tts-server.service
$ systemctl --user status tts-server.service
$ journalctl --user -xeu tts-server.service

In ~/.config/speech-dispatcher/speechd.conf I added

AddModule "coqui-generic" "sd_generic" "coqui-generic.conf"

In ~/.config/speech-dispatcher/modules/coqui-generic.conf I have:

Debug 1
### IMPORTANT!  There mustn't be any output on stdout or it will hang! ###
GenericExecuteSynth "export RATE=$RATE; export PITCH=$PITCH; /home/user/dev/tts/dispatch-chain.sh '$DATA' >&2"

GenericCmdDependency "curl"
GenericPortDependency 5002

GenericLanguage "en" "en" "utf-8"
AddVoice "en" "MALE1" "David"
DefaultVoice "David"

My dispatched-chain.sh script looks like this

#!/bin/bash
set -x

export SAVE_DIR="/home/user/dev/tts/saves"
NEW_FILENAME="$(date +'%Y.%m.%d_%H.%M.%S.%N').mp3"
export NEW_FILEPATH="${SAVE_DIR}/${NEW_FILENAME}"

TEXT="$@"
CURL_TEXT=$(echo "$TEXT" | xxd -plain | tr -d '\n' | sed 's/\(..\)/%\1/g')
export FFMPEG_TEXT=$(echo $TEXT | sed 's/'\''/`/g;s/:/\\:/g' | fold -s -w 25)

# Limit the number of items in the queue
CHECKER="/home/user/dev/tts/check-caller.py"
while true; do [ 3 -gt $(pgrep -f $CHECKER | wc -l) ] && break || sleep 1; done

curl "http://127.0.0.1:5002/api/tts?text=${CURL_TEXT}&speaker_id=&style_wav=" -s --output - |\
 ffmpeg -nostats -hide_banner -loglevel error -i - -metadata title="${TEXT}" ${NEW_FILEPATH}

python $CHECKER &

I'm calling a "call checker" script in the background.
This is because the inference takes a bit of time, I was getting frustrated with the multiple seconds of silence that were building up, when trying to read multiple sentences in quick succession.
After a few days of trying different methods (some involving concatting to named pipes) I settled for the "see if there are any previous instances running, and wait for them to finish before we play out our queued up audio".

I'm piping the tts-server output to ffmpeg, adding the text as the "title" metadata and saving it as an MP3 in a saves directory. In check-caller.py, it waits for other instances of check-caller.py to finish, then launches ffplay with the mp3 audio, generates a spectrum for the background, and the text it's reading out overlayed on top.

The queueing system works nicely, but if you decide you want to cancel with spd-say -C, you'll have to wait though/quit the ffplay's which have already been queued up to output, since the speechd invocation for it returned long ago and it won't be cancelled.

If you don't want the video output, just uncomment the commented out line -nodisp. Luckily this setup means you can change the script without reloading the service.

I'm exporting and passing some of the speechd variables into the ffplay pipeline, so we can use the rate and pitch options for spd-say.

#!/usr/bin/env python

import os
import psutil
import time
import subprocess
import sys

def run_ffplay():
  filepath = os.environ.get("NEW_FILEPATH")

# Ran into issues with this with Orca.. needs some attention
  rate = 1.0+(float(os.environ.get("RATE"))/100.0) # I like my audio fast, so set this to "1.80+..."
  pitch = 1.0+(float(os.environ.get("PITCH"))/100.0)
  ffmpeg_text = os.environ.get("FFMPEG_TEXT")

  print(f"Playing '{filepath}'", file=sys.stderr)
  
  ffplay_cmd = "ffplay -autoexit".split()
  ffplay_cmd += "-hide_banner -nostats -loglevel error -f lavfi -i".split()
  # ffplay_cmd += ["-nodisp",]
  ffplay_cmd += [f"amovie={filepath},asetrate=22050*{pitch},atempo={pitch},aresample=22050,atempo=1/{pitch},atempo={rate}, asplit [a][out1];"\
    "[a] showspectrumpic=s=400x400:legend=0:orientation=horizontal:saturation=-1:color=fire,"\
    "drawtext=fontsize=28:fontcolor=white:fontfile=FreeSans.ttf:expansion=none:text="\
    f"'{ffmpeg_text}':x=10:y=10",]
  subprocess.Popen(ffplay_cmd)

def wait_for_previous_process():
  this = psutil.Process()
  cmdline = this.cmdline()
  # print(cmdline, file=sys.stderr)

  # Check for other processes running with the same filename (excluding ourselves)
  processes = [p for p in psutil.process_iter(["pid", "name", "cmdline"]) if
               len(p.info["cmdline"]) > 1 and p.info["cmdline"] == cmdline and this.pid != p.pid]

  # print(f"Current process PID: {this.pid}\nParent process PID: {this.parent().pid}\n{[p.info for p in processes]}", file=sys.stderr)

  # Wait until the processes in front in the queue finish
  while any(psutil.pid_exists(p.pid) for p in processes):
    time.sleep(1)

  # String that confirms it's an ffplay instance we launched
  ff_chk_str = f"amovie={os.environ.get('SAVE_DIR')}"

  # Check for any running ffplays
  ff_processes = [p for p in psutil.process_iter(["pid", "name", "cmdline"]) if
                  "ffplay" == p.name() and p.cmdline()[-1].startswith(ff_chk_str)]

  # We're next in the queue, wait for ffplay to finish and then unlock
  while any(psutil.pid_exists(p.pid) for p in ff_processes)):
    # print("waiting for ffplay to finish", file=sys.stderr)
    time.sleep(0.2)

if __name__ == "__main__":
  # Wait for the previous process to finish
  wait_for_previous_process()
  run_ffplay()

You can see the output of everything with journalctl --user -xeu speech-dispatcher.service. You can fold lines by typing -S and hitting return.

Hope that helps others following in the footsteps =) No warranty is provided, your mileage may vary.

0 replies

jaggzh · 2024-06-26T12:57:01Z

jaggzh
Jun 26, 2024

@PeteHemery
That looks great. Nice and clean.
I've been struggling with just getting a basic sd_generic module working, and I can't even get it to 'touch' a /tmp/ file. I got to the point where the module seems to load and accept spd-say requests, but it's as if my GenericExecuteSynth command never runs (no matter what I put in it). (Below you'll see a broken pipe error, but that's also gone away now.)

I'm running speech-dispatcher by hand to test, since spd-say wouldn't start it for some reason: speech-dispatcher -s -t 0 -D

==> speechd-debug/speech-dispatcher.log <==
[Tue Jun 25 22:40:11 2024 : 360472] speechd: Got 8 bytes from output module over socket
[Tue Jun 25 22:40:11 2024 : 360483] speechd: finished getting data from output module
[Tue Jun 25 22:40:11 2024 : 360499] speechd: Poll in speak() returned socket activity, main_pfd revents=0, poll_pfd revents=1
[Tue Jun 25 22:40:11 2024 : 360508] speechd: wait_for_poll: activity on output_module: 1
[Tue Jun 25 22:40:11 2024 : 360512] speechd: is_sb_speaking(), SPEAKING=1
[Tue Jun 25 22:40:11 2024 : 360674] speechd: INDEX MARK: __spd_end
[Tue Jun 25 22:40:11 2024 : 360680] speechd: Locking element_free_mutex in speak()
[Tue Jun 25 22:40:11 2024 : 360684] speechd: No message in the queue
[Tue Jun 25 22:40:11 2024 : 360689] speechd: Poll in speak() returned socket activity, main_pfd revents=1, poll_pfd revents=1
[Tue Jun 25 22:40:11 2024 : 360694] speechd: wait_for_poll: activity in Speech Dispatcher
[Tue Jun 25 22:40:11 2024 : 360699] speechd: Locking element_free_mutex in speak()
[Tue Jun 25 22:40:11 2024 : 360704] speechd: No message in the queue
[Wed Jun 26 04:26:52 2024 : 575417] speechd: Terminating...
[Wed Jun 26 04:26:52 2024 : 575440] speechd: Closing open connections...
[Wed Jun 26 04:26:52 2024 : 575450] speechd: Closing speak() thread...
[Wed Jun 26 04:26:52 2024 : 575539] speechd: Closing play() thread...
[Wed Jun 26 04:26:52 2024 : 575556] speechd: speak_queue Joining play thread.
[Wed Jun 26 04:26:52 2024 : 575571] speechd: speak_queue Playback.
[Wed Jun 26 04:26:52 2024 : 575582] speechd: speak_queue Playback thread ended.......
[Wed Jun 26 04:26:52 2024 : 575596] speechd: speak_queue Stop or pause.
[Wed Jun 26 04:26:52 2024 : 575639] speechd: speak_queue Joining stop thread.
[Wed Jun 26 04:26:52 2024 : 575655] speechd: Closing open output modules...
[Wed Jun 26 04:26:52 2024 : 575664] speechd: Unloading module name=pharynx-vpi
[Wed Jun 26 04:26:52 2024 : 575673] speechd: Closing module "pharynx-vpi"...
[Wed Jun 26 04:26:52 2024 : 575866] speechd: **Error: Broken pipe to module.**
[Wed Jun 26 04:26:52 2024 : 575877] speechd: Output module working status: 0 (pid:2150622)
[Wed Jun 26 04:26:52 2024 : 575904] speechd: Output module terminated abnormally, probably crashed.
[Wed Jun 26 04:26:52 2024 : 575922] speechd: Closing server connection...
[Wed Jun 26 04:26:52 2024 : 575933] speechd: Removing pid file
[Wed Jun 26 04:26:52 2024 : 575945] speechd: Speech Dispatcher terminated correctly

My module .conf looks like:

Debug 1
GenericLanguage  "en" "en_US" "utf-8"
AddVoice        "en"    "MALE1"         "Voice"
AddVoice        "en"    "FEMALE1"       "Voicef"
DefaultVoice    "Voice"
#GenericExecuteSynth "touch /tmp/phtest"
GenericCmdDependency "echo"  # Somehow I don't think it requires this evaluation
GenericExecuteSynth " /bin/echo hi > /tmp/vv; exit 1"
# Testing if forcing some exit code resulted in any useful information ^
GenericRateAdd          1
GenericPitchAdd         1
GenericVolumeAdd        1
GenericRateMultiply     1
GenericPitchMultiply 750
GenericVolumeMultiply 2
GenericRateForceInteger     0
GenericPitchForceInteger    1
GenericVolumeForceInteger   0
# ^ Seeing if those helped with earlier issues (they didn't).

I don't see anything striking in the debug logs that would show me why the execution line is not running (or doesn't have access to write to /tmp or my /home/jag/ dir):

[Wed Jun 26 05:38:25 2024 : 370635] speechd: Adding client on fd 14
[Wed Jun 26 05:38:25 2024 : 370685] speechd: Data structures for client on fd 14 created
[Wed Jun 26 05:38:25 2024 : 370746] speechd: Command caught: "set"
[Wed Jun 26 05:38:25 2024 : 370754] speechd: Updating client specific settings "jag:spd-say:main" against emacs:*
[Wed Jun 26 05:38:25 2024 : 370877] speechd: Command caught: "set"
[Wed Jun 26 05:38:25 2024 : 370976] speechd: Command caught: "set"
[Wed Jun 26 05:38:25 2024 : 371037] speechd: Command caught: "speak"
[Wed Jun 26 05:38:25 2024 : 371044] speechd: Switching to data mode...
[Wed Jun 26 05:38:25 2024 : 371105] speechd: Buffer: |herro
| 7 bytes:
[Wed Jun 26 05:38:25 2024 : 371120] speechd: Buffer: |.
| 3 bytes:
[Wed Jun 26 05:38:25 2024 : 371125] speechd: Finishing data
[Wed Jun 26 05:38:25 2024 : 371129] speechd: Switching back to command mode...
[Wed Jun 26 05:38:25 2024 : 371133] speechd: New buf is now: |herro|
[Wed Jun 26 05:38:25 2024 : 371138] speechd: In queue_message desired output module is pharynx-vpi
[Wed Jun 26 05:38:25 2024 : 371146] speechd: Queueing message |herro| with priority 3
[Wed Jun 26 05:38:25 2024 : 371155] speechd: Message inserted into queue.
[Wed Jun 26 05:38:25 2024 : 371169] speechd: Poll in speak() returned socket activity, main_pfd revents=1, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 371185] speechd: wait_for_poll: activity in Speech Dispatcher
[Wed Jun 26 05:38:25 2024 : 371196] speechd: Locking element_free_mutex in speak()
[Wed Jun 26 05:38:25 2024 : 371213] speechd: Desired output module is pharynx-vpi
[Wed Jun 26 05:38:25 2024 : 371237] speechd: Module set parameters.
[Wed Jun 26 05:38:25 2024 : 371306] speechd: Got 26 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371353] speechd: Got 25 bytes from output m[Wed Jun 26 05:38:25 2024 : 370635] speechd: Adding client on fd 14
[Wed Jun 26 05:38:25 2024 : 370685] speechd: Data structures for client on fd 14 created
[Wed Jun 26 05:38:25 2024 : 370746] speechd: Command caught: "set"
[Wed Jun 26 05:38:25 2024 : 370754] speechd: Updating client specific settings "jag:spd-say:main" against emacs:*
[Wed Jun 26 05:38:25 2024 : 370877] speechd: Command caught: "set"
[Wed Jun 26 05:38:25 2024 : 370976] speechd: Command caught: "set"
[Wed Jun 26 05:38:25 2024 : 371037] speechd: Command caught: "speak"
[Wed Jun 26 05:38:25 2024 : 371044] speechd: Switching to data mode...
[Wed Jun 26 05:38:25 2024 : 371105] speechd: Buffer: |herro
| 7 bytes:
[Wed Jun 26 05:38:25 2024 : 371120] speechd: Buffer: |.
| 3 bytes:
[Wed Jun 26 05:38:25 2024 : 371125] speechd: Finishing data
[Wed Jun 26 05:38:25 2024 : 371129] speechd: Switching back to command mode...
[Wed Jun 26 05:38:25 2024 : 371133] speechd: New buf is now: |herro|
[Wed Jun 26 05:38:25 2024 : 371138] speechd: In queue_message desired output module is pharynx-vpi
[Wed Jun 26 05:38:25 2024 : 371146] speechd: Queueing message |herro| with priority 3
[Wed Jun 26 05:38:25 2024 : 371155] speechd: Message inserted into queue.
[Wed Jun 26 05:38:25 2024 : 371169] speechd: Poll in speak() returned socket activity, main_pfd revents=1, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 371185] speechd: wait_for_poll: activity in Speech Dispatcher
[Wed Jun 26 05:38:25 2024 : 371196] speechd: Locking element_free_mutex in speak()
[Wed Jun 26 05:38:25 2024 : 371213] speechd: Desired output module is pharynx-vpi
[Wed Jun 26 05:38:25 2024 : 371237] speechd: Module set parameters.
[Wed Jun 26 05:38:25 2024 : 371306] speechd: Got 26 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371353] speechd: Got 25 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371358] speechd: Module speak!
[Wed Jun 26 05:38:25 2024 : 371398] speechd: Got 25 bytes from output module over socket

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371419]: speak()

 Wed Jun 26 05:38:25 2024 [371438]: Recoding from UTF-8 to utf-8...

==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 371464] speechd: Removing client on fd 14
[Wed Jun 26 05:38:25 2024 : 371474] speechd: Tagging client as inactive in settings
[Wed Jun 26 05:38:25 2024 : 371482] speechd: Removing client from the fd->uid table.

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371464]: In stripping ssml: |herro|
 Wed Jun 26 05:38:25 2024 [371481]: Requested data (0): |<speak>herro</speak>|


==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 371490] speechd: Closing clients file descriptor 14
[Wed Jun 26 05:38:25 2024 : 371506] speechd: Connection closed

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371503]: Generic: leaving write() normally

 Wed Jun 26 05:38:25 2024 [371518]: Semaphore on


==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 371556] speechd: Got 16 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371605] speechd: Message sent to output module
[Wed Jun 26 05:38:25 2024 : 371619] speechd: Removing client settings for uid 1
[Wed Jun 26 05:38:25 2024 : 371637] speechd: output_module_is_speaking()
[Wed Jun 26 05:38:25 2024 : 371660] speechd: Got 10 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371668] speechd: output_module_is_speaking()
[Wed Jun 26 05:38:25 2024 : 371681] speechd: Poll in speak() returned socket activity, main_pfd revents=0, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 371690] speechd: wait_for_poll: activity on output_module: 1
[Wed Jun 26 05:38:25 2024 : 371697] speechd: is_sb_speaking(), SPEAKING=1
[Wed Jun 26 05:38:25 2024 : 371706] speechd: INDEX MARK: __spd_begin
[Wed Jun 26 05:38:25 2024 : 371712] speechd: Continuing because already speaking in speak()

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371924]: Entering parent process, closing pipes
 Wed Jun 26 05:38:25 2024 [371953]:   Looping...

 Wed Jun 26 05:38:25 2024 [371970]: Returned 5 bytes from get_part

 Wed Jun 26 05:38:25 2024 [371985]: Sending buf to child:|herro| 5

 Wed Jun 26 05:38:25 2024 [371999]: going to write 5 bytes
 Wed Jun 26 05:38:25 2024 [372016]: written 5 bytes
 Wed Jun 26 05:38:25 2024 [372031]: Waiting for response from child...

 Wed Jun 26 05:38:25 2024 [372524]: parent: Read bytes 0, child stopped

 Wed Jun 26 05:38:25 2024 [372540]: End of data in parent, closing pipes
 Wed Jun 26 05:38:25 2024 [372562]: Waiting for child...
 Wed Jun 26 05:38:25 2024 [372587]: child terminated -: status:1 signal?:0 signal number:0.


==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 372608] speechd: Got 8 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 372624] speechd: finished getting data from output module
[Wed Jun 26 05:38:25 2024 : 372645] speechd: Poll in speak() returned socket activity, main_pfd revents=0, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 372658] speechd: wait_for_poll: activity on output_module: 1
[Wed Jun 26 05:38:25 2024 : 372664] speechd: is_sb_speaking(), SPEAKING=1
[Wed Jun 26 05:38:25 2024 : 372673] speechd: INDEX MARK: __spd_end
[Wed Jun 26 05:38:25 2024 : 372681] speechd: Locking element_free_mutex in speak()
[Wed Jun 26 05:38:25 2024 : 372689] speechd: No message in the queue
[Wed Jun 26 05:38:25 2024 : 372696] speechd: Poll in speak() returned socket activity, main_pfd revents=1, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 372702] speechd: wait_for_poll: activity in Speech Dispatcher
[Wed Jun 26 05:38:25 2024 : 372709] speechd: Locking element_free_mutex in speak()
[Wed Jun 26 05:38:25 2024 : 372715] speechd: No message in the queue


odule over socket
[Wed Jun 26 05:38:25 2024 : 371358] speechd: Module speak!
[Wed Jun 26 05:38:25 2024 : 371398] speechd: Got 25 bytes from output module over socket

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371419]: speak()

 Wed Jun 26 05:38:25 2024 [371438]: Recoding from UTF-8 to utf-8...

==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 371464] speechd: Removing client on fd 14
[Wed Jun 26 05:38:25 2024 : 371474] speechd: Tagging client as inactive in settings
[Wed Jun 26 05:38:25 2024 : 371482] speechd: Removing client from the fd->uid table.

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371464]: In stripping ssml: |herro|
 Wed Jun 26 05:38:25 2024 [371481]: Requested data (0): |<speak>herro</speak>|


==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 371490] speechd: Closing clients file descriptor 14
[Wed Jun 26 05:38:25 2024 : 371506] speechd: Connection closed

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371503]: Generic: leaving write() normally

 Wed Jun 26 05:38:25 2024 [371518]: Semaphore on


==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 371556] speechd: Got 16 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371605] speechd: Message sent to output module
[Wed Jun 26 05:38:25 2024 : 371619] speechd: Removing client settings for uid 1
[Wed Jun 26 05:38:25 2024 : 371637] speechd: output_module_is_speaking()
[Wed Jun 26 05:38:25 2024 : 371660] speechd: Got 10 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 371668] speechd: output_module_is_speaking()
[Wed Jun 26 05:38:25 2024 : 371681] speechd: Poll in speak() returned socket activity, main_pfd revents=0, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 371690] speechd: wait_for_poll: activity on output_module: 1
[Wed Jun 26 05:38:25 2024 : 371697] speechd: is_sb_speaking(), SPEAKING=1
[Wed Jun 26 05:38:25 2024 : 371706] speechd: INDEX MARK: __spd_begin
[Wed Jun 26 05:38:25 2024 : 371712] speechd: Continuing because already speaking in speak()

==> speechd-debug/pharynx-vpi.log <==
 Wed Jun 26 05:38:25 2024 [371924]: Entering parent process, closing pipes
 Wed Jun 26 05:38:25 2024 [371953]:   Looping...

 Wed Jun 26 05:38:25 2024 [371970]: Returned 5 bytes from get_part

 Wed Jun 26 05:38:25 2024 [371985]: Sending buf to child:|herro| 5

 Wed Jun 26 05:38:25 2024 [371999]: going to write 5 bytes
 Wed Jun 26 05:38:25 2024 [372016]: written 5 bytes
 Wed Jun 26 05:38:25 2024 [372031]: Waiting for response from child...

 Wed Jun 26 05:38:25 2024 [372524]: parent: Read bytes 0, child stopped

 Wed Jun 26 05:38:25 2024 [372540]: End of data in parent, closing pipes
 Wed Jun 26 05:38:25 2024 [372562]: Waiting for child...
 Wed Jun 26 05:38:25 2024 [372587]: child terminated -: status:1 signal?:0 signal number:0.


==> speechd-debug/speech-dispatcher.log <==
[Wed Jun 26 05:38:25 2024 : 372608] speechd: Got 8 bytes from output module over socket
[Wed Jun 26 05:38:25 2024 : 372624] speechd: finished getting data from output module
[Wed Jun 26 05:38:25 2024 : 372645] speechd: Poll in speak() returned socket activity, main_pfd revents=0, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 372658] speechd: wait_for_poll: activity on output_module: 1
[Wed Jun 26 05:38:25 2024 : 372664] speechd: is_sb_speaking(), SPEAKING=1
[Wed Jun 26 05:38:25 2024 : 372673] speechd: INDEX MARK: __spd_end
[Wed Jun 26 05:38:25 2024 : 372681] speechd: Locking element_free_mutex in speak()
[Wed Jun 26 05:38:25 2024 : 372689] speechd: No message in the queue
[Wed Jun 26 05:38:25 2024 : 372696] speechd: Poll in speak() returned socket activity, main_pfd revents=1, poll_pfd revents=1
[Wed Jun 26 05:38:25 2024 : 372702] speechd: wait_for_poll: activity in Speech Dispatcher
[Wed Jun 26 05:38:25 2024 : 372709] speechd: Locking element_free_mutex in speak()
[Wed Jun 26 05:38:25 2024 : 372715] speechd: No message in the queue

I'm sure you had to work with yours quite a bit -- maybe you'll have some insight here (and I might then move to using your setup -- I recently was working with coqui myself. Linux could really use the modern higher quality TTS, imo).

1 reply

PeteHemery Jun 29, 2024

hey @jaggzh
I managed to catch your message not too long after you sent it.

just off the bat I'm not sure you need echo in the GenericCmdDependency..
secondly, did you see my note about anything being printed on stdout causing the thing to hang over the first invocation?
what happens when you add a ">&2" at the end of your GenericExecuteSynth command? Maybe take out the force exit and give it a whirl with an echo piped to stderr

jaggzh · 2024-06-29T04:41:12Z

jaggzh
Jun 29, 2024

Thank you :))

I can't get any command in GenericExecuteSynth to do anything -- strace -f doesn't seem to show anything ever gets executed. I've resorted to running sd_generic by hand and feeding it output gathered from the strace. All appears okay, but no child is executed, and simple commands like "touch /tmp/foo" and "echo test >/tmp/blah" (or other locations) produce no files.

There's likely just something very simple wrong (assuming there's no weird bug in sd_generic)... I'm probably not at the point where I'll try to build it and go through it myself to debug the issue though.

(My debug for pharynx-vpi.conf is being output, but it doesn't seem to show anything striking...)

 Fri Jun 28 21:07:07 2024 [676439]: Added voice Voice

 Fri Jun 28 21:07:07 2024 [676511]: Added voice Voicef

 Fri Jun 28 21:07:07 2024 [676622]: Configuration (pre) has been read from "/etc/speech-dispatcher/modules/pharynx-vpi.conf"

 Fri Jun 28 21:07:07 2024 [676749]: GenericMaxChunkLength = 300

 Fri Jun 28 21:07:07 2024 [676806]: GenericDelimiters = .

 Fri Jun 28 21:07:07 2024 [676863]: GenericExecuteSynth = echo world >$TMPDIR/speak.txt && say wutup

 Fri Jun 28 21:07:07 2024 [676928]: GenericCmdDependency = echo

 Fri Jun 28 21:07:07 2024 [676990]: GenericPortDependency = 0

 Fri Jun 28 21:07:07 2024 [677048]: Generic: creating new thread for generic_speak

 Fri Jun 28 21:07:07 2024 [677364]: generic: speaking thread starting.......

 Fri Jun 28 21:07:07 2024 [677724]: Additional logging into specific path /tmp/speechd-debug/pharynx-vpi.log requested
 Fri Jun 28 21:07:07 2024 [677804]: Additional logging initialized
 Fri Jun 28 21:07:07 2024 [678738]: Opening audio output system
 Fri Jun 28 21:07:07 2024 [680470]: Can't use server: Cannot open plugin server. error: file not found
 Fri Jun 28 21:07:07 2024 [682457]: Opening audio output system
 Fri Jun 28 21:07:07 2024 [706768]: Using pulse audio output method
 Fri Jun 28 21:07:16 2024 [80240]: speak()

 Fri Jun 28 21:07:16 2024 [80333]: Setting language en-us
 Fri Jun 28 21:07:16 2024 [80420]: Requested option by key en-us not found.

 Fri Jun 28 21:07:16 2024 [80506]: Setting voice type 1
 Fri Jun 28 21:07:16 2024 [80611]: Setting voice type 1
 Fri Jun 28 21:07:16 2024 [80709]: Volume: 100
 Fri Jun 28 21:07:16 2024 [80804]: HVolume: 3.000000
 Fri Jun 28 21:07:16 2024 [80900]: Recoding from UTF-8 to utf-8...
 Fri Jun 28 21:07:16 2024 [81014]: In stripping ssml: |squeaky|
 Fri Jun 28 21:07:16 2024 [81113]: Requested data (0): |<speak>squeaky</speak>|

 Fri Jun 28 21:07:16 2024 [81233]: Generic: leaving write() normally

 Fri Jun 28 21:07:16 2024 [81263]: Semaphore on


 Fri Jun 28 21:07:16 2024 [82143]: Entering parent process, closing pipes
 Fri Jun 28 21:07:16 2024 [82238]:   Looping...

 Fri Jun 28 21:07:16 2024 [82296]: Returned 7 bytes from get_part

 Fri Jun 28 21:07:16 2024 [82383]: Sending buf to child:|squeaky| 7

 Fri Jun 28 21:07:16 2024 [82474]: going to write 7 bytes
 Fri Jun 28 21:07:16 2024 [82591]: written 7 bytes
 Fri Jun 28 21:07:16 2024 [82673]: Waiting for response from child...

 Fri Jun 28 21:07:16 2024 [82759]: parent: Read bytes 0, child stopped

 Fri Jun 28 21:07:16 2024 [82834]: End of data in parent, closing pipes
 Fri Jun 28 21:07:16 2024 [82927]: Waiting for child...
 Fri Jun 28 21:07:16 2024 [83041]: child terminated -: status:1 signal?:0 signal number:0.

1 reply

PeteHemery Jun 29, 2024

I see you have
GenericCmdDependency = echo
GenericPortDependency = 0
I don't think those need to be set, maybe comment them out? Looks like it's modified/left over from my curl additions?

For GenericExecuteSynth
"echo world >$TMPDIR/speak.txt && say wutup"

Is your speech-dispatcher service user level or system level? Where is that $TMPDIR being populated? Or was it just an example?
But you said you've tried other things in the line and it's not getting executed..
Maybe you can try with a different config, one of the generic ones?

Does speechd without coqui work for you in your current setup?

Maybe try running spd-conf and setup a user level service, like I did? I couldn't get the system level one to work properly

securegh · 2024-10-04T20:24:18Z

securegh
Oct 4, 2024

Did anyone figure out how to setup coqui with speech-dispatcher? With pied that setup takes 2 minutes!

0 replies

PeteHemery · 2024-10-23T18:50:42Z

PeteHemery
Oct 23, 2024

I did get it working thanks ;) the write up above still working well ;P

1 reply

securegh Nov 11, 2024

No offense, I haven't tried your solution, it just looks a bit complex!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech-dispatcher integration? #2583

{{title}}

Replies: 7 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Speech-dispatcher integration? #2583

Replies: 7 comments · 3 replies

erogol May 8, 2023 Maintainer

Replies: 7 comments 3 replies

erogol
May 8, 2023
Maintainer