Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection

Phat Do, Matt Coler, Jelske Dijkstra, Esther Klabbers

Research output: Contribution to conferencePaperScientificpeer-review

Abstract

We compare using a PHOIBLE-based phone mapping methodand using phonological features input in transfer learning forTTS in low-resource languages. We use diverse source languages (English, Finnish, Hindi, Japanese, and Russian) andtarget languages (Bulgarian, Georgian, Kazakh, Swahili, Urdu,and Uzbek) to test the language-independence of the methodsand enhance the findings’ applicability. We use Character ErrorRates from automatic speech recognition and predicted MeanOpinion Scores for evaluation. Results show that both phonemapping and features input improve the output quality and thelatter performs better, but these effects also depend on the specific language combination. We also compare the recently-proposed Angular Similarity of Phone Frequencies (ASPF) witha family tree-based distance measure as a criterion to selectsource languages in transfer learning. ASPF proves effectiveif label-based phone input is used, while the language distancedoes not have expected effects.
Original languageEnglish
Pages21-26
Number of pages6
DOIs
Publication statusPublished - 06 Aug 2023
Event12th ISCA Speech Synthesis Workshop (SSW2023) - Grenoble, France
Duration: 26 Aug 202328 Aug 2023
http://ssw2023.org

Conference

Conference12th ISCA Speech Synthesis Workshop (SSW2023)
Country/TerritoryFrance
CityGrenoble
Period26/08/202328/08/2023
Internet address

Fingerprint

Dive into the research topics of 'Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection'. Together they form a unique fingerprint.

Cite this