-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathddc.html
204 lines (184 loc) · 10.6 KB
/
ddc.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
<!--
Phantom by HTML5 UP
html5up.net | @ajlkn
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<!DOCTYPE html>
<html>
<head>
<title>Tone Perception (the effect of harmonicity)</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<!--[if lte IE 8]><script src="assets/js/ie/html5shiv.js"></script><![endif]-->
<link rel="stylesheet" href="assets/css/main.css" />
<!--[if lte IE 9]><link rel="stylesheet" href="assets/css/ie9.css" /><![endif]-->
<!--[if lte IE 8]><link rel="stylesheet" href="assets/css/ie8.css" /><![endif]-->
</head>
<body>
<!-- Wrapper -->
<div id="wrapper">
<!-- Header -->
<header id="header">
</header>
<!-- Menu -->
<!-- Main -->
<div id="main">
<div class="inner">
<h1>F0-Consistent Many-to-Many Non-Parallel Voice Conversion via Conditional Autoencoder - <font color = #58c3c2>Audio Demo</font></h1>
<font size = 5><p><i> Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham Mysore</i></p></font>
<p></p>
<br></br>
<section>
<h2><font size = 5 color= #58c3c2>Paper</font></h2>
<p>Our Paper is <a href="https://arxiv.org/abs/2004.07370">here</a>.
</section>
<!-- Text -->
<section>
<a name="traditional"></a>
<h2><font size = 5 color= #58c3c2>Qualitative Evaluation</font></h2>
<p>(Section 3.2 in the paper)</p>
<p>Our main goal is to compare F0-<font style="font-variant: small-caps">AutoVC</font> against the original <font style="font-variant: small-caps">AutoVC</font>, but we also include 2 additional baselines for better comparison.</p>
<ul class="12u 12u$(medium)">
<li><b><font style="font-variant: small-caps">F0-AutoVC</font></b> - the proposed F0-conditioned autoencoder-based conversion algorithm</li>
<li><b><font style="font-variant: small-caps">AutoVC</font></b> - the original autoencoder-based conversion algorithm</li>
<li><b>StarGAN-VC</b> - a voice conversion system that adopts the StarGAN paradigm</li>
<li><b>Chou et. al.</b> - a voice conversion system combining autoencoder with GAN and speaker classifier</li>
</ul>
<p>Below are a few demo audios.</p>
<div class="table-wrapper">
<table>
<thead>
<tr>
<th><font color= #58c3c2>Source Speaker / Speech</font></th>
<th><font color= #58c3c2>Target Speaker / Speech</font></th>
<th><font color= #58c3c2>Conversion</font></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="4">p225 (Female) <audio controls=""><source src="audios/ground_1/p225_003_001.wav" /><embed height="50" src="audios/ground_1/p225_003_001.wav" width="100"></embed></audio></td>
<td rowspan="4">p226 (Male) <audio controls=""><source src="audios/ground_2/p226_002.wav" /><embed height="50" src="audios/ground_2/p226_002.wav" width="100"></embed></audio></td>
<td><font style="font-variant: small-caps">F0-AutoVC</font></td>
<td><audio controls=""><source src="audios/ours/p225xp226_ours_u003001.mp3" /><embed height="50" src="audios/ours/p225xp226_ours_u003001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td><font style="font-variant: small-caps">AutoVC</font></td>
<td><audio controls=""><source src="audios/cae/p225xp226_cae_u003001.mp3" /><embed height="50" src="audios/cae/p225xp226_cae_u003001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>StarGAN-VC</td>
<td><audio controls=""><source src="audios/star/p225xp226_star_u003001.mp3" /><embed height="50" src="audios/star/p225xp226_star_u003001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>Chou et. al.</td>
<td><audio controls=""><source src="audios/chou/p225xp226_chou_u003001.mp3" /><embed height="50" src="audios/chou/p225xp226_chou_u003001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td rowspan="4">p225 (Female) <audio controls=""><source src="audios/ground_1/p225_006_001.wav" /><embed height="50" src="audios/ground_1/p225_006_001.wav" width="100"></embed></audio></td>
<td rowspan="4">p270 (Male) <audio controls=""><source src="audios/ground_2/p270_002.wav" /><embed height="50" src="audios/ground_2/p270_002.wav" width="100"></embed></audio></td>
<td><font style="font-variant: small-caps">F0-AutoVC</font></td>
<td><audio controls=""><source src="audios/ours/p225xp270_ours_u006001.mp3" /><embed height="50" src="audios/ours/p225xp270_ours_u006001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td><font style="font-variant: small-caps">AutoVC</font></td>
<td><audio controls=""><source src="audios/cae/p225xp270_cae_u006001.mp3" /><embed height="50" src="audios/cae/p225xp270_cae_u006001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>StarGAN-VC</td>
<td><audio controls=""><source src="audios/star/p225xp270_star_u006001.mp3" /><embed height="50" src="audios/star/p225xp270_star_u006001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>Chou et. al.</td>
<td><audio controls=""><source src="audios/chou/p225xp270_chou_u006001.mp3" /><embed height="50" src="audios/chou/p225xp270_chou_u006001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td rowspan="4">p226 (Male) <audio controls=""><source src="audios/ground_1/p226_008001.wav" /><embed height="50" src="audios/ground_1/p226_008001.wav" width="100"></embed></audio></td>
<td rowspan="4">p225 (Female) <audio controls=""><source src="audios/ground_2/p225_002.wav" /><embed height="50" src="audios/ground_2/p225_002.wav" width="100"></embed></audio></td>
<td><font style="font-variant: small-caps">F0-AutoVC</font></td>
<td><audio controls=""><source src="audios/ours/p226xp225_ours_u008001.mp3" /><embed height="50" src="audios/ours/p226xp225_ours_u008001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td><font style="font-variant: small-caps">AutoVC</font></td>
<td><audio controls=""><source src="audios/cae/p226xp225_cae_u008001.mp3" /><embed height="50" src="audios/cae/p226xp225_cae_u008001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>StarGAN-VC</td>
<td><audio controls=""><source src="audios/star/p226xp225_star_u008001.mp3" /><embed height="50" src="audios/star/p226xp225_star_u008001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>Chou et. al.</td>
<td><audio controls=""><source src="audios/chou/p226xp225_chou_u008001.mp3" /><embed height="50" src="audios/chou/p226xp225_chou_u008001.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td rowspan="4">p227 (Male) <audio controls=""><source src="audios/ground_1/p227_010.wav" /><embed height="50" src="audios/ground_1/p227_010.wav" width="100"></embed></audio></td>
<td rowspan="4">p233 (Female) <audio controls=""><source src="audios/ground_2/p233_002.wav" /><embed height="50" src="audios/ground_2/p233_002.wav" width="100"></embed></audio></td>
<td><font style="font-variant: small-caps">F0-AutoVC</font></td>
<td><audio controls=""><source src="audios/ours/p227xp233_ours_u010.mp3" /><embed height="50" src="audios/ours/p227xp233_ours_u010.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td><font style="font-variant: small-caps">AutoVC</font></td>
<td><audio controls=""><source src="audios/cae/p227xp233_cae_u010.mp3" /><embed height="50" src="audios/cae/p227xp233_cae_u010.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>StarGAN-VC</td>
<td><audio controls=""><source src="audios/star/p227xp233_star_u010.mp3" /><embed height="50" src="audios/star/p227xp233_star_u010.mp3" width="100"></embed></audio>
</td>
</tr>
<tr>
<td>Chou et. al.</td>
<td><audio controls=""><source src="audios/chou/p227xp233_chou_u010.mp3" /><embed height="50" src="audios/chou/p227xp233_chou_u010.mp3" width="100"></embed></audio>
</td>
</tr>
</tbody>
</table>
</div>
<a href="#" class="button special">Back to Top</a>
<a href="#traditional" class="button special">Back to Section Start</a>
<br></br><br></br>
</section>
<section>
<a name="f0-control"></a>
<h2><font size = 5 color= #58c3c2>F0-Control</font></h2>
<p>(Section 3.1.3 in the paper)</p>
<p>We are able to control the converted F0 by modifying the conditioned F0. <br> For demonstration purpose, we simply modify the conditioned F0 to be a constant value.</p>
<p>Below are a few demo audios.</p>
<div class="table-wrapper">
<table>
<tbody>
<tr>
<td><audio controls=""><source src="audios/flat/p225xp226_cae_u001.wav" /><embed height="50" src="audios/flat/p225xp226_cae_u001.wav" width="100"></embed></audio></td>
<td><audio controls=""><source src="audios/flat/p225xp227_cae_u003001.wav" /><embed height="50" src="audios/flat/p225xp227_cae_u003001.wav" width="100"></embed></audio></td>
<td><audio controls=""><source src="audios/flat/p225xp228_cae_u006001.wav" /><embed height="50" src="audios/flat/p225xp228_cae_u006001.wav" width="100"></embed></audio>
</td>
</tr>
</tbody>
</table>
</div>
<a href="#" class="button special">Back to Top</a>
<a href="#f0-control" class="button special">Back to Section Start</a>
<br></br><br></br>
</section>
</div>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/skel.min.js"></script>
<script src="assets/js/util.js"></script>
<!--[if lte IE 8]><script src="assets/js/ie/respond.min.js"></script><![endif]-->
<script src="assets/js/main.js"></script>
</body>
</html>