Estimating Visual Information from Sound through Manifold Learning

This is the code repository our paper [arXiv].

Setup

Dependencies

We suggest to use miniconda to setup an environment for this project.

pytorch
torchvision
torchaudio
python-opencv
h5py
tensorboard

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch conda install python-opencv h5py tensorboard -c conda-forge

Datasets

The OAP and MAVD datasets must be converted into our hdf5 format. For audio and each one of the visual modality: RGB, Depth, and Semantic segmentation, we compile a specific h5 file. Inside these files data samples are simply indexed by their original names. In our experiments we use images at lower resolution w.r.t the original dataset, for ease the loading of data we save resized images 256x256 and encode in jpeg for RGB, png for semantic segmentation and, float for depth. Instead for audio we downsample the original audio to 16khz and save it asfloat normalized to [-1, +1] in the h5 file.

Preparation scripts

The OAP dataset can be downloaded by following the instructions here. The MAVD dataset can be downloaded by following the instructions here.

Instructions for

Training

1. VQ-VAE

Our method works in two stage. The first stage consist in learning the quantized manifold of the interested visual modality. For doing this: python trian.py --mode manifold --manifold (vqvae|vae) --dataset (eth|mavd) --data (depth|seg) (--depth_f <h5 file> | --seg_f <h5 file>) --vq_emb_num <num> --vq_emb_dim <num> --in_size <input size> --out_size <manifold size> --batch_size <num> --ifr <num> --cpus <num> --log_dir <dir>

2. Audio Manifold Transform

python trian.py --mode transofrm --manifold (vqvae|vae) --dataset (eth|mavd) --data (depth|seg) --audio_f <h5 file> (--depth_f <h5 file> | --seg_f <h5 file>) --frm audio --to depth --vq_to <ckpt file> --vq_emb_num <num> --vq_emb_dim <num> --batch_size <num> --ifr <num> --in_size <input size> --out_size <manifold size> --cpus <num> --log_dir <dir>

Testing

Cite

[arXiv]# audio_manifold

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_processing		data_processing
mavd/data		mavd/data
oap/data		oap/data
script		script
.gitignore		.gitignore
DATA.md		DATA.md
README.md		README.md
cityscapes.py		cityscapes.py
custom.py		custom.py
data.py		data.py
echo2depth.py		echo2depth.py
manifold.py		manifold.py
metrics.py		metrics.py
mft.py		mft.py
train.py		train.py
util.py		util.py
vq.py		vq.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Estimating Visual Information from Sound through Manifold Learning

Setup

Dependencies

Datasets

Preparation scripts

Training

1. VQ-VAE

2. Audio Manifold Transform

Testing

Cite

About

Uh oh!

Releases

Packages

Languages

ubc-vision/audio_manifold

Folders and files

Latest commit

History

Repository files navigation

Estimating Visual Information from Sound through Manifold Learning

Setup

Dependencies

Datasets

Preparation scripts

Training

1. VQ-VAE

2. Audio Manifold Transform

Testing

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages