Literary authorship attribution with phrase-structure fragments

Andreas van Cranenburgh

Research output: Chapter in book/volumeContribution to conference proceedingsScientificpeer-review

13 Citations (Scopus)

Abstract

We present a method of authorship attribution and stylometry that exploits hierarchical information in phrase-structures. Contrary to much previous work in stylometry, we focus on content words rather than function words. Texts are parsed to obtain phrase-structures, and compared with texts to be analyzed. An efficient tree kernel method identifies common tree fragments among data of known authors and unknown texts. These fragments are then used to identify authors and characterize their styles. Our experiments show that the structural information from fragments provides complementary information to the baseline trigram model.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Computational Linguistics for Literature
Place of PublicationStroudsburg, PA
PublisherAssociation for Computational Linguistics (ACL)
Pages59-63
ISBN (Print)978-1-937284-20-6
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Literary authorship attribution with phrase-structure fragments'. Together they form a unique fingerprint.

Cite this