Samenvatting
We present a method of authorship attribution and stylometry that exploits hierarchical information in phrase-structures. Contrary to much previous work in stylometry, we focus on content words rather than function words. Texts are parsed to obtain phrase-structures, and compared with texts to be analyzed. An efficient tree kernel method identifies common tree fragments among data of known authors and unknown texts. These fragments are then used to identify authors and characterize their styles. Our experiments show that the structural information from fragments provides complementary information to the baseline trigram model.
Originele taal-2 | Engels |
---|---|
Titel | Proceedings of the Workshop on Computational Linguistics for Literature |
Plaats van productie | Stroudsburg, PA |
Uitgeverij | Association for Computational Linguistics (ACL) |
Pagina's | 59-63 |
ISBN van geprinte versie | 978-1-937284-20-6 |
Status | Gepubliceerd - 2012 |