Automatic summarization of domain-specific forum threads: Collecting reference data

Suzan Verberne, Antal van den Bosch, Sander Wubben, Emiel Krahmer

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review

6 Citaten (Scopus)
83 Downloads (Pure)


We create and analyze two sets of reference summaries for
discussion threads on a patient support forum: expert summaries
and crowdsourced, non-expert summaries. Ideally,
reference summaries for discussion forum threads are created
by expert members of the forum community. When
there are few or no expert members available, crowdsourcing
the reference summaries is an alternative. In this paper
we investigate whether domain-specific forum data requires
the hiring of domain experts for creating reference
summaries. We analyze the inter-rater agreement for both
datasets and we train summarization models using the two
types of reference summaries. The inter-rater agreement in
crowdsourced reference summaries is low, close to random,
while domain experts achieve a considerably higher, fair,
agreement. The trained models however are similar to each
other. We conclude that it is possible to train an extractive
summarization model on crowdsourced data that is similar
to an expert model, even if the inter-rater agreement for the
crowdsourced data is low.
Originele taal-2Engels
TitelProceedings of the 2017 Conference on Human Information Interaction and Retrieval (CHIIR-2017)
UitgeverijAssociation for Computing Machinery (ACM)
StatusGepubliceerd - 07 mrt. 2017

Citeer dit