Literary authorship attribution with phrase-structure fragments

Andreas van Cranenburgh

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review

13 Citaten (Scopus)

Samenvatting

We present a method of authorship attribution and stylometry that exploits hierarchical information in phrase-structures. Contrary to much previous work in stylometry, we focus on content words rather than function words. Texts are parsed to obtain phrase-structures, and compared with texts to be analyzed. An efficient tree kernel method identifies common tree fragments among data of known authors and unknown texts. These fragments are then used to identify authors and characterize their styles. Our experiments show that the structural information from fragments provides complementary information to the baseline trigram model.
Originele taal-2Engels
TitelProceedings of the Workshop on Computational Linguistics for Literature
Plaats van productieStroudsburg, PA
UitgeverijAssociation for Computational Linguistics (ACL)
Pagina's59-63
ISBN van geprinte versie978-1-937284-20-6
StatusGepubliceerd - 2012

Vingerafdruk

Duik in de onderzoeksthema's van 'Literary authorship attribution with phrase-structure fragments'. Samen vormen ze een unieke vingerafdruk.

Citeer dit