PoS Tagging, Lemmatization and Dependency Parsing of West Frisian

W.J. Heeringa*, Gosse Bouma, Martha Hofman, Jelle Brouwer, Eduard Drenth, Jan Wijffels, H. Van de Velde

*Corresponding author for this work

Research output: Contribution to conferenceAbstractScientific

71 Downloads (Pure)

Abstract

We present a lemmatizer/PoS tagger/dependency parser for West Frisian using a corpus of 44,714 words in 3,126 sentences that were annotated according to the guidelines of Universal Dependencies version 2. PoS tags were assigned to words by using a Dutch PoS tagger that was applied to a Dutch word-by-word translation, or to sentences of a Dutch parallel text. Best results were obtained when using word-by-word translations that were created by using the previous version of the Frisian translation program Oersetter. Morphologic and syntactic annotations were generated on the basis of a Dutch word-by-word
translation as well. The performance of the lemmatizer/tagger/annotator when it was trained using default parameters was compared to the performance that was obtained when using the parameter values that were used for training the LassySmall UD 2.5 corpus. We study the effects of different hyperparameter settings on the accuracy of the annotation pipeline. The Frisian lemmatizer/PoS tagger/dependency parser is released as a web app and as a web service.
Original languageEnglish
Number of pages8
Publication statusPublished - 01 Jul 2022
Event13th Language Resources and Evaluation Conference - Marseille, France
Duration: 21 Jun 202223 Jun 2022
https://lrec2022.lrec-conf.org/en/

Conference

Conference13th Language Resources and Evaluation Conference
Abbreviated titleLREC 2022
Country/TerritoryFrance
CityMarseille
Period21/06/202223/06/2022
Internet address

Keywords

  • Frisian, lemmatization, PoS tagging, dependency parsing, Universal Dependencies

Fingerprint

Dive into the research topics of 'PoS Tagging, Lemmatization and Dependency Parsing of West Frisian'. Together they form a unique fingerprint.

Cite this