Cross-lingual Voice Conversion

Speech Samples

The model is evaluated with speech from two database:
English: VCC2018[1]
Mandarin: The Speech synthesis-library of average model provided by Databaker(http://www.data-baker.com).

There are 2 systems implemented for comparison:
iSE: the cross-lingual VC system using the i-vector based Speaker Embedding (iSE), and we benchmark this as our baseline.
SEN: the proposed jointly trained Speaker Embedding Network (SEN).

Enable a Monolingual English Speaker to Speak Mandarin

	Source	Target	Baseline iSE	Proposed SEN
Sample 1
Sample 2

Sample 1
Source
Target
Monolingual Baseline
Bilingual Baseline

Sample 2
Source
Target
Monolingual Baseline
Bilingual Baseline

Enable a Monolingual English Speaker to Speak Mandarin

	Source	Target	Baseline iSE	Proposed SEN
Sample 1
Sample 2

Sample 1
Source
Target
Monolingual Baseline
Bilingual Baseline

Sample 2
Source
Target
Monolingual Baseline
Bilingual Baseline

References

[1] Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, and Zhenhua Ling, “The voice conversion challenge 2018: Promoting development of parallel and nonparallel method,” in Speaker Odyssey, 2018.