AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data

ArXiv: arXiv:2104.09715

Accepted by ICASSP 2021

Author

Audio Samples

All of the audio samples use MelGAN as vocoder.

Audio Quality

When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.

VCTK Female

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

VCTK Male

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

Some have accepted it as a miracle without physical explanation.

VCTK Female

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

VCTK Male

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

the invention of movable metal letters in the middle of the fifteenth century may justly be considered as the invention of the art of printing.

LJSpeech

GT GT Mel+Vocoder Joint-training
PPG-based AdaSpeech Ours(AdaSpeech 2)

Analyses on adaptation strategy

When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow.

Origin Without L2 loss constraint Fine-tune mel encoder & decoder

Varying Adaptation Data

Please call stella.

1 samples 2 samples 5 samples 10 samples
20 samples 50 samples 100 samples

FastSpeech: Fast, Robust and Controllable Text to Speech