The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation

E. Tjong Kim Sang, Marcel Bollman, Remko Boschker, Francisco Casacuberta, Feike Dietz, Stefanie Dipper, Miguel Domingo, Rob van der Goot, Marjo van Koppen, Nikola Ljubesic, Robert Ostling, Florian Petran, Eva Pettersson, Yves Scherrer, Marijn Schraagen, Leen Sevens, Jorg Tiedemann, Tom Vanallemeersch, Kalliopi Zervanou

Research output: Contribution to journal/periodicalArticleScientificpeer-review

Abstract

The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).
Original languageEnglish
Article number88
Pages (from-to)53-64
Number of pages12
JournalComputational Linguistics in the Netherlands Journal
Volume7
DOIs
Publication statusPublished - 2017

Keywords

  • historical text
  • Dutch
  • machine translation
  • part-of-speech tagging

Fingerprint

Dive into the research topics of 'The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation'. Together they form a unique fingerprint.

Cite this