Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

Onderzoeksoutput: Bijdrage aan conferentiePaperWetenschappelijkpeer review

Samenvatting

We compare using a PHOIBLE-based phone mapping methodand using phonological features input in transfer learning forTTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) andtarget languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu,and Uzbek) to test the language-independence of the methodsand enhance the findings’ applicability. We use Character ErrorRates from automatic speech recognition and predicted MeanOpinion Scores for evaluation. Results show that both phonemapping and features input improve the output quality and thelatter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) witha family tree-based distance measure as a criterion to selectsource languages in transfer learning. ASPF proves effectiveif label-based phone input is used, while the language distancedoes not have expected effects.
Originele taal-2Engels
Pagina's21-26
Aantal pagina's6
DOI's
StatusGepubliceerd - 06 aug. 2023
Evenement12th ISCA Speech Synthesis Workshop (SSW2023) - Grenoble, Frankrijk
Duur: 26 aug. 202328 aug. 2023
http://ssw2023.org

Conferentie

Conferentie12th ISCA Speech Synthesis Workshop (SSW2023)
Land/RegioFrankrijk
StadGrenoble
Periode26/08/202328/08/2023
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection'. Samen vormen ze een unieke vingerafdruk.

Citeer dit