about the speech speaker condition selection #810

JohnHerry · 2024-08-05T03:13:48Z

I read the paper and noticed that when training AR model part, the speaker condition is another clip of the same person speaking, while when training the Diffusion part, it seems that the speaker condition clip is just a clip of the target speech. why there is a diffrent design? what if we use the same target speech as speaker cond during the training of AR model？ or just use another sample of the same speaker as speaker condition when training the Diffusion model? what is the reason? thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the speech speaker condition selection #810

about the speech speaker condition selection #810

JohnHerry commented Aug 5, 2024

about the speech speaker condition selection #810

about the speech speaker condition selection #810

Comments

JohnHerry commented Aug 5, 2024