Skip to content

Commit

Permalink
add timit w2vu recipe (facebookresearch#1991)
Browse files Browse the repository at this point in the history
Summary:
## What does this PR do?
Add TIMIT data preparation scripts for wav2vec-U

Pull Request resolved: fairinternal/fairseq-py#1991

Reviewed By: alexeib

Differential Revision: D29284481

Pulled By: wnhsu

fbshipit-source-id: dccd75159a9de4f3cd95f9e4a90ce4bdf9264f2b
  • Loading branch information
Wei-Ning Hsu authored and facebook-github-bot committed Jun 22, 2021
1 parent e47a4c8 commit 900a607
Show file tree
Hide file tree
Showing 10 changed files with 14,373 additions and 0 deletions.
10 changes: 10 additions & 0 deletions examples/wav2vec/unsupervised/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,16 @@ The fifth argument is which phonemizer to use. Supported values are [espeak](htt

Pre-trained fasttext LID models can be downloaded [here](https://fasttext.cc/docs/en/language-identification.html).

### Prepare TIMIT data
TIMIT transcripts include silence. Therefore VAD is not used for audio preprocessing, and we do not wrap transcripts with silences or insert random silence in between words.

To prepare TIMIT data for both the matched an unmatched setup:
```shell
bash scripts/prepare_timit.sh /dir/to/timit/raw/data /output/dir /path/to/wav2vec2/model.pt
```

Note that we assume the TIMIT distribution with capitalized directories and filenames are used (e.g., `TRAIN/DR1/FCJF0/SA1.PHN`).

## Generative adversarial training (GAN)

We then use a GAN model to build a first unsupervised ASR model. The data preparation above of both speech features and text data is a necessary procedure that enables the generator to match speech to text in an unsupervised way.
Expand Down
192 changes: 192 additions & 0 deletions examples/wav2vec/unsupervised/config/timit_matched/test.uid
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
FDHC0_SI1559
FDHC0_SI2189
FDHC0_SI929
FDHC0_SX119
FDHC0_SX209
FDHC0_SX29
FDHC0_SX299
FDHC0_SX389
FELC0_SI1386
FELC0_SI2016
FELC0_SI756
FELC0_SX126
FELC0_SX216
FELC0_SX306
FELC0_SX36
FELC0_SX396
FJLM0_SI1043
FJLM0_SI1673
FJLM0_SI2303
FJLM0_SX143
FJLM0_SX233
FJLM0_SX323
FJLM0_SX413
FJLM0_SX53
FMGD0_SI1564
FMGD0_SI2194
FMGD0_SI934
FMGD0_SX124
FMGD0_SX214
FMGD0_SX304
FMGD0_SX34
FMGD0_SX394
FMLD0_SI2185
FMLD0_SI822
FMLD0_SI925
FMLD0_SX115
FMLD0_SX205
FMLD0_SX25
FMLD0_SX295
FMLD0_SX385
FNLP0_SI1308
FNLP0_SI1938
FNLP0_SI678
FNLP0_SX138
FNLP0_SX228
FNLP0_SX318
FNLP0_SX408
FNLP0_SX48
FPAS0_SI1272
FPAS0_SI2204
FPAS0_SI944
FPAS0_SX134
FPAS0_SX224
FPAS0_SX314
FPAS0_SX404
FPAS0_SX44
FPKT0_SI1538
FPKT0_SI2168
FPKT0_SI908
FPKT0_SX188
FPKT0_SX278
FPKT0_SX368
FPKT0_SX8
FPKT0_SX98
MBPM0_SI1577
MBPM0_SI1584
MBPM0_SI947
MBPM0_SX137
MBPM0_SX227
MBPM0_SX317
MBPM0_SX407
MBPM0_SX47
MCMJ0_SI1094
MCMJ0_SI464
MCMJ0_SI602
MCMJ0_SX104
MCMJ0_SX14
MCMJ0_SX194
MCMJ0_SX284
MCMJ0_SX374
MDAB0_SI1039
MDAB0_SI1669
MDAB0_SI2299
MDAB0_SX139
MDAB0_SX229
MDAB0_SX319
MDAB0_SX409
MDAB0_SX49
MGRT0_SI1450
MGRT0_SI2080
MGRT0_SI820
MGRT0_SX10
MGRT0_SX100
MGRT0_SX190
MGRT0_SX280
MGRT0_SX370
MJDH0_SI1354
MJDH0_SI1984
MJDH0_SI724
MJDH0_SX184
MJDH0_SX274
MJDH0_SX364
MJDH0_SX4
MJDH0_SX94
MJLN0_SI1449
MJLN0_SI2079
MJLN0_SI819
MJLN0_SX189
MJLN0_SX279
MJLN0_SX369
MJLN0_SX9
MJLN0_SX99
MJMP0_SI1535
MJMP0_SI1791
MJMP0_SI905
MJMP0_SX185
MJMP0_SX275
MJMP0_SX365
MJMP0_SX5
MJMP0_SX95
MKLT0_SI1213
MKLT0_SI1843
MKLT0_SI583
MKLT0_SX133
MKLT0_SX223
MKLT0_SX313
MKLT0_SX403
MKLT0_SX43
MLLL0_SI1363
MLLL0_SI1993
MLLL0_SI733
MLLL0_SX103
MLLL0_SX13
MLLL0_SX193
MLLL0_SX283
MLLL0_SX373
MLNT0_SI1574
MLNT0_SI1902
MLNT0_SI642
MLNT0_SX102
MLNT0_SX12
MLNT0_SX192
MLNT0_SX282
MLNT0_SX372
MNJM0_SI1580
MNJM0_SI2210
MNJM0_SI950
MNJM0_SX140
MNJM0_SX230
MNJM0_SX320
MNJM0_SX410
MNJM0_SX50
MPAM0_SI1189
MPAM0_SI1819
MPAM0_SI1961
MPAM0_SX109
MPAM0_SX19
MPAM0_SX199
MPAM0_SX289
MPAM0_SX379
MTAS1_SI1473
MTAS1_SI2098
MTAS1_SI838
MTAS1_SX118
MTAS1_SX208
MTAS1_SX28
MTAS1_SX298
MTAS1_SX388
MTLS0_SI1370
MTLS0_SI2000
MTLS0_SI740
MTLS0_SX110
MTLS0_SX20
MTLS0_SX200
MTLS0_SX290
MTLS0_SX380
MWBT0_SI1553
MWBT0_SI2183
MWBT0_SI923
MWBT0_SX113
MWBT0_SX203
MWBT0_SX23
MWBT0_SX293
MWBT0_SX383
MWEW0_SI1361
MWEW0_SI1991
MWEW0_SI731
MWEW0_SX101
MWEW0_SX11
MWEW0_SX191
MWEW0_SX281
MWEW0_SX371
Loading

0 comments on commit 900a607

Please sign in to comment.