Skip to content

Commit

Permalink
Keep word separators in transcripts with '_' (#16)
Browse files Browse the repository at this point in the history
  • Loading branch information
kaiidams authored Feb 23, 2023
1 parent db85825 commit f226d5c
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 10 deletions.
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ which is in the public domain. The audio clips
are from
[LibriVox project](https://librivox.org/),
which is also in the public domain.
Readings are estimated by
Readings are estimated by
[MeCab](https://taku910.github.io/mecab/)
and
[UniDic Lite](https://pypi.org/project/unidic-lite/)
Expand All @@ -28,7 +28,7 @@ The audio clips were split and transcripts were aligned automatically by

[Listen](https://kaiidams.github.io/Kokoro-Speech-Dataset/samples.html)
from your browser or download
[randomly sampled 100 clips](https://github.com/kaiidams/Kokoro-Speech-Dataset/releases/download/1.2/kokoro-speech-v1_2-sample-flac.zip).
[randomly sampled 100 clips](https://github.com/kaiidams/Kokoro-Speech-Dataset/releases/download/1.3/kokoro-speech-v1_3-sample-flac.zip).

## File Format

Expand Down Expand Up @@ -82,7 +82,7 @@ Total duration: 00:24:05
Because of its large data size of the dataset, audio files are not
included in this repository, but the metadata is included.

To make .wav files of the dataset, run
To make .wav files of the dataset, run

```
$ bash download.sh
Expand Down Expand Up @@ -131,7 +131,7 @@ which is not included in `small`.
The dataset contains recordings from these books read by
[ekzemplaro](https://librivox.org/reader/7044)

- [明暗 (Meian)](https://librivox.org/meian-by-soseki-natsume/) 16:39:29
- [明暗 (Meian)](https://librivox.org/meian-by-soseki-natsume/) 16:39:29
[Online text](http://www.aozora.gr.jp/cards/000148/files/782_14969.html)
- [こころ (Kokoro)](https://librivox.org/kokoro-by-soseki-natsume/) 08:46:41
[Online text](http://www.aozora.gr.jp/cards/000148/files/773_14560.html)
Expand Down Expand Up @@ -167,10 +167,11 @@ contains audio clips of various languages from LibriVox.

## Changelog

- v1.2 new metadata generated with a new align model
- v1.3 Keep word separators in transcripts with '_'
- v1.2 New metadata generated with a new align model
- v1.1.1 Added FLAC, MP3, OGG support
- v1.1 Added more books
- v1.0 Current release
- v1.0 Initial release

## Credits

Expand All @@ -181,5 +182,5 @@ Alignment and annotation by [Katsuya Iida](mailto:[email protected]).
## License

This dataset is in the public domain in the USA (and most likely other countries as well).
There are no restrictions on its use. For more information, please see:
There are no restrictions on its use. For more information, please see:
[librivox.org/pages/public-domain](https://librivox.org/pages/public-domain).
4 changes: 2 additions & 2 deletions download.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@

mkdir ./data
cd ./data
curl -LO https://github.com/kaiidams/Kokoro-Speech-Dataset/releases/download/1.2/kokoro-speech-v1_2.zip
unzip kokoro-speech-v1_2.zip
curl -LO https://github.com/kaiidams/Kokoro-Speech-Dataset/releases/download/1.3/kokoro-speech-v1_3.zip
unzip kokoro-speech-v1_3.zip
2 changes: 1 addition & 1 deletion extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def extract_wav_files(data_dir, params_list, clip_format, sample_rate, output_di
assert len(y.shape) == 2 and y.shape[0] == 1
assert y.dtype == torch.float32
assert sr == sample_rate
y = (y * max_int16 / torch.max(torch.abs(y))).to(torch.int16)
y = (y * max_int16 / torch.max(torch.abs(y))).to(torch.int16)
current_file = audio_file
current_audio = y
output_file = os.path.join(output_dir, clip_dir, f'{id_}.{clip_ext}')
Expand Down

0 comments on commit f226d5c

Please sign in to comment.