The model is evaluated with speech from two database:
English: VCC2018[1]
Mandarin: The Speech synthesis-library of average model provided by Databaker(http://www.data-baker.com).
There are 2 systems implemented for comparison: iSE: the cross-lingual VC system using the i-vector based Speaker Embedding (iSE), and we benchmark this as our baseline.
SEN: the proposed jointly trained Speaker Embedding Network (SEN).
Enable a Monolingual English Speaker to Speak Mandarin
Source
Target
Baseline iSE
Proposed SEN
Sample 1
Sample 2
Sample 1
Source
Target
Monolingual Baseline
Bilingual Baseline
Sample 2
Source
Target
Monolingual Baseline
Bilingual Baseline
Enable a Monolingual English Speaker to Speak Mandarin