Speech Samples
The model is evaluated with bilingual speakers from:
the EMIME Mandarin Bilingual Database [1]:
There are 4 systems implemented for comparison:
(1) bPPG-LI (baseline):
The cross-lingual voice conversion system using bilingual PPG (bPPG) with a single language-independent (LI) output layer.
(2) mPPG-LI:
The cross-lingual voice conversion system using mixed-lingual PPG (mPPG) with a single language-independent (LI) output layer.
(3) bPPG-LS:
The cross-lingual voice conversion system using bilingual PPG (bPPG) with language-specific (LS) output layers.
(4) mPPG-LS:
The cross-lingual voice conversion system using mixed-lingual PPG (mPPG) with language-specific (LS) output layers.
en → cn:
we are converting an English source utterance to a Mandarin target speaker's voice.
cn → en:
we are converting a Mandarin source utterance to an English target speaker's voice.
F : Female Speaker
M : Male Speaker
cn (F) → en (M) | cn (M) → en (F) | en (F) → cn (M) | en (M) → cn (F) | |
---|---|---|---|---|
Source | ||||
Target | ||||
bPPG-LI | ||||
mPPG-LI | ||||
bPPG-LS | ||||
mPPG-LS |
References
[1] Mirjam Wester and Hui Liang, “The emime mandarin bilingual database,” Tech. Rep. EDI-INF-RR-1396, The University of Edinburgh, 2011.