Speechviz is a tool to
- Automatically process audio and video data—performing speaker diarization, voice-activity detection, speech recognition, and face detection
- Visualize the generated annotations in a user-friendly interface that allows playing the audio segments and refining the generated annotations to correct any errors
Before you can get started, you'll have to get an access token to use pyannote. You can do so by following these steps:
- Login to or signup for https://huggingface.co/
- Visit each of the following and accept the user conditions:
- Go to https://huggingface.co/settings/tokens and create an access token
- Set your
PYANNOTE_AUTH_TOKEN
environment variable to your access token
The image is built from the Aria Data Tools image, so you'll need to build that container first.
git clone https://github.com/facebookresearch/Aria_data_tools.git --recursive
cd Aria_data_tools
docker build -t aria_data_tools .
After that's finished, build the Speechviz container.
git clone https://research-git.uiowa.edu/uiowa-audiology-reu-2022/speechviz.git
cd speechviz
docker build --build-arg \
PYANNOTE_AUTH_TOKEN="${PYANNOTE_AUTH_TOKEN}" \
-t speechviz .
Note that the above commands build the image with PyTorch CPU support only.
If you'd like to include support for CUDA, follow the instructions for using the
NVIDIA Container Toolkit
and add --build-arg cuda=true
to the docker build
command above:
docker build --build-arg \
PYANNOTE_AUTH_TOKEN="${PYANNOTE_AUTH_TOKEN}" \
--build-arg cuda=true -t speechviz .
You'll want to mount your data into the image. To create the data folder, repository, and database, run these 3 commands:
npm run mkdir
python3 scripts/init_fossil.py
You can then start the container by running
docker run -it \
-v ./data:/speechviz/data \
-v ./speechviz.sqlite3:/speechviz/speechviz.sqlite3 \
speechviz
If you're going to use the interface in the container, use the -p PORT:PORT
option.
By default, the interface uses port 3000, so the command for that port is
docker run -it -p 3000:3000 \
-v ./data:/speechviz/data \
-v ./speechviz.sqlite3:/speechviz/speechviz.sqlite3 \
speechviz
git clone https://research-git.uiowa.edu/uiowa-audiology-reu-2022/speechviz.git
cd speechviz
npm install
npm run mkdir
python3 scripts/init_fossil.py
To use process_audio.py
, you will need to install
audiowaveform
and ffmpeg. The remaining dependencies for process_audio.py
can be installed using pip
or conda
.
For encode_faces.py
and cluster_faces.py
, you will need to install
dlib.
If you'll be using extract-vrs-data.py
, you will need to install
VRS.
Lastly, for create_poses.py
, you will need to install
Aria Data Tools.
To install with PyTorch CPU support only:
pip3 install --extra-index-url \
"https://download.pytorch.org/whl/cpu" \
-r requirements.txt
To install with PyTorch CUDA support (Linux and Windows only):
pip3 install --extra-index-url \
"https://download.pytorch.org/whl/cu116" \
-r requirements.txt cuda-python nvidia-cudnn
conda env create -f environment.yml
Audio can be processed by moving the audio file to data/audio
(or data/video
for video files) and running
python3 scripts/process_audio.py data/audio/FILE
Then, to view the results on the interface, run
npm start
and open http://localhost:3000.
For a more in-depth usage guide, see USAGE.md.
If installing on Bigcore, you are likely to run into an error relating to a proxy URL. To resolve this, run the following command:
http_proxy="http://$(echo $http_proxy)" && https_proxy="http://$(echo $https_proxy)"
If you receive a subprocess.CalledProcessError
relating to ffmpeg
, running the
following should resolve the issue:
conda update ffmpeg
If installing for the first time on a fresh wsl and you get this error /usr/bin/env: ‘bash\r’: No such file or directory
the problem is likely you don't have nodejs. This should fix it:
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sudo apt install nodejs npm