Data preparation can be done using the script provided in data_prep/, which accepts
manifest with the following structureWav/flac files should be of desired sample rate and channels. (For speechcommands, 16000 kHz, mono). The label string comprises of list of label(s) delimited byfiles,labels "path to wav/flac file","label_string"
. For eg, for a multilabel dataset it could be"up,down"
tags, the delimiter being,
label map which maps individual labels to integer values{ "backward": 0, "bed": 1, "bird": 2, ..... }
- The following command can then be used to create the tfrecords
More options can be found in the script. The above command was used to generate speechcommands v2 tfrecords
python --manifest_path train.csv --labels_map lbl_map.json --output_dir $BASE_OUTPUT_DIR --split_name train --multiproc_threads 6 --files_per_record 2048 --desired_duration 1. --clip_larger_files --compression "ZLIB"
- Once this is done, final step is to make
files with the following structurewhich are used in model configs such as recipes/speechcommands/configs/resnet18.pyfiles $BASE_DIR/$SPLIT_NAME/file_00000-00020_bytes_compressed.tfrec