TY - CONF
T1 - A Speech Recognizer for Frisian/Dutch Council Meetings
AU - Bentum, Martijn
AU - ten Bosch, Louis
AU - van den Heuvel, Henk
AU - Wills, Simone
AU - van der Niet, Domenique
AU - Dijkstra, Jelske
AU - Van de Velde, Hans
N1 - Funding Information:
The development of the speech recognizer was financed and executed by Provinsje Fryslân, Wetterskip Fryslân and the municipalities Achtkarspelen, Dantumadiel, Fryske Marren, Heerenveen, Leeuwarden, Noardeast-Fryslân, Opsterland, Smallingerland, Sudwest-Fryslân, Tytsjerksteradiel and Waadhoeke. The development of the multilingual and multidialect transcription software was financed and executed by Fryske Akademy and Humain'r. The development of the training program for Frisian/Dutch transcriptions for training ASR systems was financed and executed by Fryske Akademy and Humain'r.
Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC-4.0.
PY - 2022/6
Y1 - 2022/6
N2 - We developed a bilingual Frisian/Dutch speech recognizer for council meetings in Fryslân (the Netherlands). During these meetings both Frisian and Dutch are spoken, and code switching between both languages shows up frequently. The new speech recognizer is based on an existing speech recognizer for Frisian and Dutch named FAME!, which was trained and tested on historical radio broadcasts. Adapting a speech recognizer for the council meeting domain is challenging because of acoustic background noise, speaker overlap and the jargon typically used in council meetings. To train the new recognizer, we used the radio broadcast materials utilized for the development of the FAME! recognizer and added newly created manually transcribed audio recordings of council meetings from eleven Frisian municipalities, the Frisian provincial council and the Frisian water board. The council meeting recordings consist of 49 hours of speech, with 26 hours of Frisian speech and 23 hours of Dutch speech. Furthermore, from the same sources, we obtained texts in the domain of council meetings containing 11 million words; 1.1 million Frisian words and 9.9 million Dutch words. We describe the methods used to train the new recognizer, report the observed word error rates, and perform an error analysis on remaining errors.
AB - We developed a bilingual Frisian/Dutch speech recognizer for council meetings in Fryslân (the Netherlands). During these meetings both Frisian and Dutch are spoken, and code switching between both languages shows up frequently. The new speech recognizer is based on an existing speech recognizer for Frisian and Dutch named FAME!, which was trained and tested on historical radio broadcasts. Adapting a speech recognizer for the council meeting domain is challenging because of acoustic background noise, speaker overlap and the jargon typically used in council meetings. To train the new recognizer, we used the radio broadcast materials utilized for the development of the FAME! recognizer and added newly created manually transcribed audio recordings of council meetings from eleven Frisian municipalities, the Frisian provincial council and the Frisian water board. The council meeting recordings consist of 49 hours of speech, with 26 hours of Frisian speech and 23 hours of Dutch speech. Furthermore, from the same sources, we obtained texts in the domain of council meetings containing 11 million words; 1.1 million Frisian words and 9.9 million Dutch words. We describe the methods used to train the new recognizer, report the observed word error rates, and perform an error analysis on remaining errors.
KW - ASR
KW - Code Switching
KW - Domain Adaptation
KW - Dutch
KW - Frisian
UR - http://www.scopus.com/inward/record.url?scp=85144363069&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85144363069
SP - 1009
EP - 1015
ER -