Capturing Contentiousness: Constructing the Contentious Terms in Context Corpus

Ryan Brate, Andrei Nesterov, Valentin Vogelmann, Jacco van Ossenbruggen, Laura Hollink, Marieke van Erp

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review

4 Citaten (Scopus)


Recent initiatives by cultural heritage institutions in addressing outdated and offensive language used in their collections demonstrate the need for further understanding into when terms are problematic or contentious. This paper presents an annotated dataset of 2,715 unique samples of terms in context, drawn from a historical newspaper archive, collating 21,800 annotations of contentiousness from expert and crowd workers. We describe the contents of the corpus by analysing inter-rater agreement and differences between experts and crowd workers. In addition, we demonstrate the potential of the corpus for automated detection of contentiousness. We show that a simple classifier applied to the embedding representation of a target word provides a better than baseline performance in predicting contentiousness. We find that the term itself and the context play a role in whether a term is considered contentious.
Originele taal-2Engels
TitelK-CAP '21
SubtitelProceedings of the 11th on Knowledge Capture Conference
UitgeverijAssociation for Computing Machinery (ACM)
StatusGepubliceerd - dec. 2021

Publicatie series

NaamACM Digital Library


Duik in de onderzoeksthema's van 'Capturing Contentiousness: Constructing the Contentious Terms in Context Corpus'. Samen vormen ze een unieke vingerafdruk.

Citeer dit