Speechviz

An annotation tool for analyzing real-world auditory soundscapes

![interface](docs/interface.png)

Speechviz

Speechviz is a tool to

Automatically process audio and video data—performing speaker diarization, voice-activity detection, speech recognition, and face detection
Visualize the generated annotations in a user-friendly interface that allows playing the audio segments and refining the generated annotations to correct any errors

pyannote access

Before you can get started, you'll have to get an access token to use pyannote. You can do so by following these steps:

Login to or signup for https://huggingface.co/
Visit each of the following and accept the user conditions:
Go to https://huggingface.co/settings/tokens and create an access token
Set your PYANNOTE_AUTH_TOKEN environment variable to your access token

Docker / Podman image

The image is built from the Aria Data Tools image, so you'll need to build that container first.

git clone https://github.com/facebookresearch/Aria_data_tools.git --recursive
cd Aria_data_tools
docker build -t aria_data_tools .

After that's finished, build the Speechviz container.

git clone https://research-git.uiowa.edu/uiowa-audiology-reu-2022/speechviz.git
cd speechviz
docker build --build-arg \
    PYANNOTE_AUTH_TOKEN="${PYANNOTE_AUTH_TOKEN}" \
    -t speechviz .

Note that the above commands build the image with PyTorch CPU support only. If you'd like to include support for CUDA, follow the instructions for using the NVIDIA Container Toolkit and add --build-arg cuda=true to the docker build command above:

docker build --build-arg \
    PYANNOTE_AUTH_TOKEN="${PYANNOTE_AUTH_TOKEN}" \
    --build-arg cuda=true -t speechviz .

You'll want to mount your data into the image. To create the data folder, repository, and database, run these 3 commands:

npm run mkdir
python3 scripts/init_fossil.py

You can then start the container by running

docker run -it \
    -v ./data:/speechviz/data \
    -v ./speechviz.sqlite3:/speechviz/speechviz.sqlite3 \
    speechviz

If you're going to use the interface in the container, use the -p PORT:PORT option. By default, the interface uses port 3000, so the command for that port is

docker run -it -p 3000:3000 \
    -v ./data:/speechviz/data \
    -v ./speechviz.sqlite3:/speechviz/speechviz.sqlite3 \
    speechviz

Manual installation

git clone https://research-git.uiowa.edu/uiowa-audiology-reu-2022/speechviz.git
cd speechviz

Setup the interface

npm install
npm run mkdir
python3 scripts/init_fossil.py

Install script dependencies

To use process_audio.py, you will need to install audiowaveform and ffmpeg. The remaining dependencies for process_audio.py can be installed using pip or conda. For encode_faces.py and cluster_faces.py, you will need to install dlib. If you'll be using extract-vrs-data.py, you will need to install VRS. Lastly, for create_poses.py, you will need to install Aria Data Tools.

pip

To install with PyTorch CPU support only:

pip3 install --extra-index-url \
    "https://download.pytorch.org/whl/cpu" \
    -r requirements.txt

To install with PyTorch CUDA support (Linux and Windows only):

pip3 install --extra-index-url \
    "https://download.pytorch.org/whl/cu116" \
    -r requirements.txt cuda-python nvidia-cudnn

conda

conda env create -f environment.yml

Usage

Audio can be processed by moving the audio file to data/audio (or data/video for video files) and running

python3 scripts/process_audio.py data/audio/FILE

Then, to view the results on the interface, run

npm start

and open http://localhost:3000.
For a more in-depth usage guide, see USAGE.md.

Troubleshooting

If installing on Bigcore, you are likely to run into an error relating to a proxy URL. To resolve this, run the following command:

http_proxy="http://$(echo $http_proxy)" && https_proxy="http://$(echo $https_proxy)"

If you receive a subprocess.CalledProcessError relating to ffmpeg, running the following should resolve the issue:

conda update ffmpeg

If installing for the first time on a fresh wsl and you get this error /usr/bin/env: ‘bash\r’: No such file or directory the problem is likely you don't have nodejs. This should fix it:

export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sudo apt install nodejs npm

Name		Name	Last commit message	Last commit date
Latest commit History 1,099 Commits
bin		bin
docs		docs
public/models		public/models
queries		queries
routes		routes
scripts		scripts
server		server
src		src
tests		tests
views		views
.dockerignore		.dockerignore
.gitignore		.gitignore
.npmrc		.npmrc
.pre-commit-config.yaml		.pre-commit-config.yaml
Design_Implementation.pdf		Design_Implementation.pdf
Dockerfile		Dockerfile
README.md		README.md
TODO.md		TODO.md
USAGE.md		USAGE.md
app.js		app.js
environment.yml		environment.yml
eslint.config.js		eslint.config.js
example-pipeline-config.yaml		example-pipeline-config.yaml
format-specifications.md		format-specifications.md
jest.config.js		jest.config.js
jsdoc.config.js		jsdoc.config.js
mass-audio-pipeline.yaml		mass-audio-pipeline.yaml
package-lock.json		package-lock.json
package.json		package.json
prettier.config.js		prettier.config.js
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speechviz

An annotation tool for analyzing real-world auditory soundscapes

Speechviz

Contents

pyannote access

Docker / Podman image

Manual installation

Setup the interface

Install script dependencies

pip

conda

Usage

Troubleshooting

About

Contributors 3

Languages

evfinkn/speechviz

Folders and files

Latest commit

History

Repository files navigation

Speechviz

An annotation tool for analyzing real-world auditory soundscapes

Speechviz

Contents

pyannote access

Docker / Podman image

Manual installation

Setup the interface

Install script dependencies

pip

conda

Usage

Troubleshooting

About

Resources

Stars

Watchers

Forks

Contributors 3

Languages