Skip to content

Commit abe7659

Browse files
committed
update
1 parent 932afb7 commit abe7659

File tree

595 files changed

+82251
-47
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

595 files changed

+82251
-47
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
__pycache__/*
2+
pengi/configs/base.pth

DATASETS.md

+34-34
Original file line numberDiff line numberDiff line change
@@ -46,18 +46,18 @@ Each CSV file includes the following columns:
4646
<br>
4747

4848
| Dataset | Type | Classes | Split | Size |
49-
|:-- |:-- |:--: |:--: | :--: |
50-
| [Beijing-Opera](#beijing-opera) | Instrument Classification | 4 | Five-Fold | 68 MB
51-
| [CREMA-D](#crema-d) | Emotion Recognition | 6 | Train-Test | 653M
52-
| [ESC50](#esc50) | Sound Event Classification | 50 | Five-Fold | 777M
53-
| [ESC50-Actions](#esc50-actions) | Sound Event Classification | 10 | Five-Fold | 772M
54-
| [GT-Music-Genre](#gt-music-genre) | Music Analysis | 10 | Train-Test | 1.4G
55-
| [NS-Instruments](#ns-instruments) | Instrument Classification | 10 | Train-Test | 14G
56-
| [RAVDESS](#ravdess) | Emotion Recognition | 8 | Train-Test | 683M
57-
| [SESA](#sesa) | Surveillance Sound Classification | 4 | Train-Test | 51M
58-
| [TUT2017](#tut2017) | Acoustic Scene Classification | 15 | Four-Fold | 12G
59-
| [UrbanSound8K](#urbansound8k) | Sound Event Classification | 10 | Ten-Fold | 6.8G
60-
| [VocalSound](#vocalsound) | Vocal Sound Classification | 6 | Train-Test | 6.9G
49+
|:-- |:-- |:--: |:--: | --: |
50+
| [Beijing-Opera](#beijing-opera) | Instrument Classification | 4 | Five-Fold | 69 MB |
51+
| [CREMA-D](#crema-d) | Emotion Recognition | 6 | Train-Test | 606 MB |
52+
| [ESC50](#esc50) | Sound Event Classification | 50 | Five-Fold | 881 MB |
53+
| [ESC50-Actions](#esc50-actions) | Sound Event Classification | 10 | Five-Fold | 881 MB |
54+
| [GT-Music-Genre](#gt-music-genre) | Music Analysis | 10 | Train-Test | 1.3 GB |
55+
| [NS-Instruments](#ns-instruments) | Instrument Classification | 10 | Train-Test | 18.5 GB
56+
| [RAVDESS](#ravdess) | Emotion Recognition | 8 | Train-Test | 1.1 GB |
57+
| [SESA](#sesa) | Surveillance Sound Classification | 4 | Train-Test | 70 MB |
58+
| [TUT2017](#tut2017) | Acoustic Scene Classification | 15 | Four-Fold | 12.3 GB |
59+
| [UrbanSound8K](#urbansound8k) | Sound Event Classification | 10 | Ten-Fold | 6.8 GB |
60+
| [VocalSound](#vocalsound) | Vocal Sound Classification | 6 | Train-Test | 8.2 GB |
6161

6262
<br><br>
6363
<hr><hr>
@@ -78,8 +78,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
7878
huggingface_hub.snapshot_download(repo_id="MahiA/Beijing-Opera", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "Beijing-Opera"))
7979
```
8080
|Type | Classes | Split | Size |
81-
|:-- |:--: |:--: | :--: |
82-
| Instrument Classification | 4 | Five-Fold | 68 MB |
81+
|:-- |:--: |:--: | --: |
82+
| Instrument Classification | 4 | Five-Fold | 69 MB |
8383

8484
<br>
8585
<hr>
@@ -94,8 +94,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
9494
huggingface_hub.snapshot_download(repo_id="MahiA/CREMA-D", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "CREMA-D"))
9595
```
9696
|Type | Classes | Split | Size |
97-
|:-- |:--: |:--: | :--: |
98-
| Emotion Recognition | 6 | Train-Test | |
97+
|:-- |:--: |:--: | --: |
98+
| Emotion Recognition | 6 | Train-Test | 606 MB |
9999

100100
<br>
101101
<hr>
@@ -110,8 +110,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
110110
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50"))
111111
```
112112
|Type | Classes | Split | Size |
113-
|:-- |:--: |:--: | :--: |
114-
| Sound Event Classification | 50 | Five-Fold | |
113+
|:-- |:--: |:--: | --: |
114+
| Sound Event Classification | 50 | Five-Fold | 881 MB |
115115

116116
<br>
117117
<hr>
@@ -126,8 +126,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
126126
huggingface_hub.snapshot_download(repo_id="MahiA/ESC50-Actions", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "ESC50-Actions"))
127127
```
128128
|Type | Classes | Split | Size |
129-
|:-- |:--: |:--: | :--: |
130-
| Sound Event Classification | 10 | Five-Fold | |
129+
|:-- |:--: |:--: | --: |
130+
| Sound Event Classification | 10 | Five-Fold | 881 MB |
131131

132132
<br>
133133
<hr>
@@ -142,8 +142,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
142142
huggingface_hub.snapshot_download(repo_id="MahiA/GT-Music-Genre", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "GT-Music-Genre"))
143143
```
144144
|Type | Classes | Split | Size |
145-
|:-- |:--: |:--: | :--: |
146-
| Music Analysis | 10 | Train-Test | |
145+
|:-- |:--: |:--: | --: |
146+
| Music Analysis | 10 | Train-Test | 1.3 GB |
147147

148148
<br>
149149
<hr>
@@ -158,8 +158,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
158158
huggingface_hub.snapshot_download(repo_id="MahiA/NS-Instruments", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "NS-Instruments"))
159159
```
160160
|Type | Classes | Split | Size |
161-
|:-- |:--: |:--: | :--: |
162-
| Instrument Classification | 10 | Train-Test | |
161+
|:-- |:--: |:--: | --: |
162+
| Instrument Classification | 10 | Train-Test | 18.5 GB |
163163

164164
<br>
165165
<hr>
@@ -174,8 +174,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
174174
huggingface_hub.snapshot_download(repo_id="MahiA/RAVDESS", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "RAVDESS"))
175175
```
176176
|Type | Classes | Split | Size |
177-
|:-- |:--: |:--: | :--: |
178-
| Emotion Recognition | 8 | Train-Test | |
177+
|:-- |:--: |:--: | --: |
178+
| Emotion Recognition | 8 | Train-Test | 1.1 GB |
179179

180180
<br>
181181
<hr>
@@ -190,8 +190,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
190190
huggingface_hub.snapshot_download(repo_id="MahiA/SESA", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "SESA"))
191191
```
192192
|Type | Classes | Split | Size |
193-
|:-- |:--: |:--: | :--: |
194-
| Surveillance Sound Classification | 4 | Train-Test | |
193+
|:-- |:--: |:--: | --: |
194+
| Surveillance Sound Classification | 4 | Train-Test | 70 MB |
195195

196196
<br>
197197
<hr>
@@ -206,8 +206,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
206206
huggingface_hub.snapshot_download(repo_id="MahiA/TUT2017", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "TUT2017"))
207207
```
208208
|Type | Classes | Split | Size |
209-
|:-- |:--: |:--: | :--: |
210-
| Acoustic Scene Classification | 15 | Four-Fold | |
209+
|:-- |:--: |:--: | --: |
210+
| Acoustic Scene Classification | 15 | Four-Fold | 12.3 GB |
211211

212212

213213
<br>
@@ -223,8 +223,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
223223
huggingface_hub.snapshot_download(repo_id="MahiA/UrbanSound8K", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "UrbanSound8K"))
224224
```
225225
|Type | Classes | Split | Size |
226-
|:-- |:--: |:--: | :--: |
227-
| Sound Event Classification | 10 | Ten-Fold | |
226+
|:-- |:--: |:--: | --: |
227+
| Sound Event Classification | 10 | Ten-Fold | 6.8 GB |
228228

229229
<br>
230230
<hr>
@@ -239,8 +239,8 @@ if not os.path.exists(audio_datasets_path): print(f"Given {audio_datasets_path=}
239239
huggingface_hub.snapshot_download(repo_id="MahiA/VocalSound", repo_type="dataset", local_dir=os.path.join(audio_datasets_path, "VocalSound"))
240240
```
241241
|Type | Classes | Split | Size |
242-
|:-- |:--: |:--: | :--: |
243-
| Vocal Sound Classification | 6 | Train-Test | |
242+
|:-- |:--: |:--: | --: |
243+
| Vocal Sound Classification | 6 | Train-Test | 8.2 GB |
244244

245245
<br>
246246
<hr>

README.md

+13-13
Original file line numberDiff line numberDiff line change
@@ -97,23 +97,23 @@ wget https://zenodo.org/records/8387083/files/base.pth
9797
We have performed experiments on 11 audio classification datasets. Instructions for downloading/processing datasets used by our method have been provided in the [DATASETS.md](DATASETS.md).
9898

9999
| Dataset | Type | Classes | Size | Link |
100-
|:-- |:-- |:--: |:--: |:-- |
101-
| [Beijing-Opera](https://compmusic.upf.edu/bo-perc-dataset) | Instrument Classification | 4 | | [Instructions](DATASETS.md#beijing-opera) |
102-
| [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D) | Emotion Recognition | 6 | | [Instructions](DATASETS.md#crema-d) |
103-
| [ESC50](https://github.com/karolpiczak/ESC-50) | Sound Event Classification | 50 | | [Instructions](DATASETS.md#esc50) |
104-
| [ESC50-Actions](https://github.com/karolpiczak/ESC-50) | Sound Event Classification | 10 | | [Instructions](DATASETS.md#esc50-actions) |
105-
| [GT-Music-Genre](https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification) | Music Analysis | 10 | | [Instructions](DATASETS.md#gt-music-genre) |
106-
| [NS-Instruments](https://magenta.tensorflow.org/datasets/nsynth) | Instrument Classification | 10 | | [Instructions](DATASETS.md#ns-instruments) |
107-
| [RAVDESS](https://zenodo.org/records/1188976#.YFZuJ0j7SL8) | Emotion Recognition | 8 | | [Instructions](DATASETS.md#ravdess) |
108-
| [SESA](https://zenodo.org/records/3519845) | Surveillance Sound Classification | 4 | | [Instructions](DATASETS.md#sesa) |
109-
| [TUT2017](https://zenodo.org/records/400515) | Acoustic Scene Classification | 15 | | [Instructions](DATASETS.md#tut2017) |
110-
| [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html) | Sound Event Classification | 10 | | [Instructions](DATASETS.md#urbansound8k) |
111-
| [VocalSound](https://github.com/YuanGongND/vocalsound) | Vocal Sound Classification | 6 | | [Instructions](DATASETS.md#vocalsound) |
100+
|:-- |:-- |:--: |--: |:-- |
101+
| [Beijing-Opera](https://compmusic.upf.edu/bo-perc-dataset) | Instrument Classification | 4 | 69 MB | [Instructions](DATASETS.md#beijing-opera) |
102+
| [CREMA-D](https://github.com/CheyneyComputerScience/CREMA-D) | Emotion Recognition | 6 | 606 MB | [Instructions](DATASETS.md#crema-d) |
103+
| [ESC50](https://github.com/karolpiczak/ESC-50) | Sound Event Classification | 50 | 881 MB | [Instructions](DATASETS.md#esc50) |
104+
| [ESC50-Actions](https://github.com/karolpiczak/ESC-50) | Sound Event Classification | 10 | 881 MB | [Instructions](DATASETS.md#esc50-actions) |
105+
| [GT-Music-Genre](https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification) | Music Analysis | 10 | 1.3 GB | [Instructions](DATASETS.md#gt-music-genre) |
106+
| [NS-Instruments](https://magenta.tensorflow.org/datasets/nsynth) | Instrument Classification | 10 | 18.5 GB | [Instructions](DATASETS.md#ns-instruments) |
107+
| [RAVDESS](https://zenodo.org/records/1188976#.YFZuJ0j7SL8) | Emotion Recognition | 8 | 1.1 GB | [Instructions](DATASETS.md#ravdess) |
108+
| [SESA](https://zenodo.org/records/3519845) | Surveillance Sound Classification | 4 | 70 MB | [Instructions](DATASETS.md#sesa) |
109+
| [TUT2017](https://zenodo.org/records/400515) | Acoustic Scene Classification | 15 | 12.3 GB | [Instructions](DATASETS.md#tut2017) |
110+
| [UrbanSound8K](https://urbansounddataset.weebly.com/urbansound8k.html) | Sound Event Classification | 10 | 6.8 GB | [Instructions](DATASETS.md#urbansound8k) |
111+
| [VocalSound](https://github.com/YuanGongND/vocalsound) | Vocal Sound Classification | 6 | 8.2 GB | [Instructions](DATASETS.md#vocalsound) |
112112

113113
</br>
114114
</br>
115115

116-
All datasets should be placed in a directory named `Audio-Datasets,` and the path of this directory should be specified in the variable `DATASET_ROOT` in the shell [`scripts`](/scripts/). The directory structure should be as follows:
116+
All datasets should be placed in a directory named `Audio-Datasets` and the path of this directory should be specified in the variable `DATASET_ROOT` in the shell [`scripts`](/scripts/). The directory structure should be as follows:
117117
```
118118
Audio-Datasets/
119119
├── Beijing-Opera/

0 commit comments

Comments
 (0)