Speech Samples


The model is evaluated with bilingual speakers from:
the EMIME Mandarin Bilingual Database [1]:


There are 4 systems implemented for comparison:
(1) bPPG-LI (baseline):
The cross-lingual voice conversion system using bilingual PPG (bPPG) with a single language-independent (LI) output layer.
(2) mPPG-LI:
The cross-lingual voice conversion system using mixed-lingual PPG (mPPG) with a single language-independent (LI) output layer.
(3) bPPG-LS:
The cross-lingual voice conversion system using bilingual PPG (bPPG) with language-specific (LS) output layers.
(4) mPPG-LS:
The cross-lingual voice conversion system using mixed-lingual PPG (mPPG) with language-specific (LS) output layers.


en → cn:
we are converting an English source utterance to a Mandarin target speaker's voice.
cn → en:
we are converting a Mandarin source utterance to an English target speaker's voice.


F : Female Speaker
M : Male Speaker

cn (F) → en (M) cn (M) → en (F) en (F) → cn (M) en (M) → cn (F)
Source
Target
bPPG-LI
mPPG-LI
bPPG-LS
mPPG-LS

References

[1] Mirjam Wester and Hui Liang, “The emime mandarin bilingual database,” Tech. Rep. EDI-INF-RR-1396, The University of Edinburgh, 2011.