Speech-to-Phrase

A fast and local speech-to-text system that is personalized with your Home Assistant device and area names.

Speech-to-phrase is not a general purpose speech recognition system. Instead of answering the question "what did the user say?", it answers "which of the phrases I know did the user say?". This is accomplished by combining pre-defined sentence templates with the names of your Home Assistant entities, areas, and floors that have been exposed to Assist.

Supported languages

Custom sentences

You can add your own sentences and list values with --custom-sentences-dir <DIR> where <DIR> contains directories of YAML files per language. For example:

python3 -m speech_to_phrase ... --custom-sentences-dir /path/to/custom_sentences

For an English model, you could have /path/to/custom_sentences/en/sentences.yaml with:

language: "en"
lists:
  todo_item:
    values:
      - "apples"  # make sure to use quotes!
      - "bananas"

This would allow you to say "add apples to my shopping list" if you have a todo entity in Home Assistant exposed with the name "shopping list".

You can also create lists with the same names as your sentence trigger wildcards to make them usable in speech-to-phrase.

Docker container

A Docker container is available that can be connected to Home Assistant via the wyoming integration:

docker run -it -p 10300:10300 \
  -v /path/to/download/models:/models \
  -v /path/to/train:/train rhasspy/wyoming-speech-to-phrase \
  --hass-websocket-uri 'ws://homeassistant.local:8123/api/websocket' \
  --hass-token '<LONG_LIVED_ACCESS_TOKEN>' \
  --retrain-on-start

Models and tools

Speech models and tools are downloaded automatically from HuggingFace

How it works

Speech-to-phrase combines pre-defined sentence templates with the names of things from your Home Assistant to produce a hassil template file. This file compactly represents all of the possible sentences that can be recognized, which may be hundreds, thousands, or even millions.

Using techniques developed in the Rhasspy project, speech-to-phrase converts the compact sentence templates into a finite state transducer (FST) which is then used to train a language model for Kaldi. The opengrm tooling is crucial for efficiency during this step, as it avoids unpacking the sentence templates into every possible combination.

Each speech-to-phrase model contains a pre-built dictionary of word pronunciations as well as a phonetisaurus model that will guess pronunciations for unknown words.

During training, a lot of "magic" happens to ensure that your entity, area, and floor names can be recognized automatically:

Words with numbers are split apart ("PM2.5" becomes "PM 2.5")
Initialisms are further split ("PM" or "P.M." becomes "P M")
Digits are replaced with their spoken word forms ("123" becomes "one hundred twenty three")
Unknown words have their pronunciations guessed

To make phrase recognition more robust, a "fuzzy" layer is added on top of Kaldi's transcription output. This layer can correct small errors, such as duplicate or missing words, and also ensures that output names are exactly what you have in Home Assistant.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
docs		docs
script		script
speech_to_phrase		speech_to_phrase
tests		tests
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.projectile		.projectile
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
icon.png		icon.png
logo.png		logo.png
mypy.ini		mypy.ini
pylintrc		pylintrc
pyproject.toml		pyproject.toml
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Phrase

Supported languages

Custom sentences

Docker container

Models and tools

How it works

About

Releases

Packages

Languages

License

Sabering1/speech-to-phrase

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Phrase

Supported languages

Custom sentences

Docker container

Models and tools

How it works

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages