TY - CHAP
T1 - A multilingual wikified data set of educational material
AU - Hendrickx, Iris
AU - Takoulidou, Eirini
AU - Naskos, Thanasis
AU - Kermanidis, Katia Lida
AU - Sosoni, Vilelmini
AU - De Vos, Hugo
AU - Stasimioti, Maria
AU - Van Zaanen, Menno
AU - Georgakopoulou, Panayota
AU - Egg, Markus
AU - Kordoni, Valia
AU - Popovic, Maja
AU - Van Den Bosch, Antal
N1 - geen doi
PY - 2019
Y1 - 2019
N2 - We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.
AB - We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.
KW - Crowdsourcing
KW - MOOCs
KW - Wikification
UR - https://www.mendeley.com/catalogue/9fa5175c-e639-3819-b88c-125d74e8d22e/
M3 - Chapter
SN - 9791095546009
T3 - LREC 2018 - 11th International Conference on Language Resources and Evaluation
SP - 467
EP - 473
BT - LREC 2018 - 11th International Conference on Language Resources and Evaluation
PB - European Language Resources Association (ELRA)
ER -