Speech Samples
(a) parallel-VC : A sequence-to-sequence voice conversion system with parallel speech data between the source and target accents. This system cannot preserve the speaker identity.
(b) BNF-AC : This is an accent conversion system that takes bottleneck features as input and generates acoustic features [1].
(c) BNF-PC-AC: This system is an extension from system (b) with a pronuciation correction model [2].
(d) TTS : The multi-speaker TTS system, which is trained with target-accented speech data.
(e) TTS-AC : Our proposed accent conversion system without parallel data.
source: The natural speech from a speaker with the source accent, we hope to change the accent of his/her speech to the target while preserving the speech content and his/her identity.
target: The natural speech from a different speaker with the target accent. It is used as a reference. In this work, we define American, British, and Canadian English as the target accents.
Convert Chinese Accent to the Target Accent | ||||
---|---|---|---|---|
System | Chinese female speaker | Chinese male speaker | ||
Source | ||||
(a) parallel-VC | ||||
(b) BNF-AC | ||||
(c) BNF-PC-AC | ||||
(d) TTS | ||||
(e) TTS-AC | ||||
Target (Reference) | ||||
Convert Indian Accent to the Target Accent | ||||
System | Indian female speaker | Indian male speaker | ||
Source | ||||
(a) parallel-VC | ||||
(b) BNF-AC | ||||
(c) BNF-PC-AC | ||||
(d) TTS | ||||
(e) TTS-AC | ||||
Target (Reference) |
References
[1]
Zhao Guanlong, Sonsaat Sinem, Levis John, Chukharev-Hudilainen Evgeny, and Gutierrez-Osuna Ricardo, “Accent conversion using phonetic posteriorgrams,” in IEEE ICASSP, 2018, pp. 5314–5318..
[2]
Guanlong Zhao, Shaojin Ding, and Ricardo Gutierrez-Osuna, “Converting foreign accent speech without a reference,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2367–2381, 2021.