TY - CHAP
T1 - Computational methods for the analysis of fiction genres
AU - van Cranenburgh, Andreas
AU - Allen, Laura
AU - Sharoff, Serge
AU - van Dalen-Oskam, K.H.
PY - 2024/10/1
Y1 - 2024/10/1
N2 - This chapter presents a multimethod, multidisciplinary analysis of genre in a large dataset of 9,800 English novels, in order to deepen our understanding of aspects of fiction genres and subgenres. We specifically focus on applying well-established, interpretable methods, in order to benefit scholars from a variety of disciplines. Objects of our analysis are written texts and the linguistic features of the texts. We approach the analysis from two directions: data-driven, with topic modeling of content words, and theory-driven, with features Douglas Biber selected for his research on register, for example, in his 1988 book Variation across Speech and Writing, as well as simple readability metrics. We illustrate these methods by applying them to a corpus of fiction (novels). The texts in our corpora are English, but our methods aim to be also applicable to corpora in other languages. The research questions we try to answer with the proposed methods are whether different kinds of novels (“subgenres”) can be distinguished from each other in their use of linguistic features, and what the results of the computational methods can reveal to researchers to assist them in a renewed qualitative analysis of the texts or in phrasing new hypotheses for further research. Code and data are available at https://github.com/andreasvc/fictiongenres/
AB - This chapter presents a multimethod, multidisciplinary analysis of genre in a large dataset of 9,800 English novels, in order to deepen our understanding of aspects of fiction genres and subgenres. We specifically focus on applying well-established, interpretable methods, in order to benefit scholars from a variety of disciplines. Objects of our analysis are written texts and the linguistic features of the texts. We approach the analysis from two directions: data-driven, with topic modeling of content words, and theory-driven, with features Douglas Biber selected for his research on register, for example, in his 1988 book Variation across Speech and Writing, as well as simple readability metrics. We illustrate these methods by applying them to a corpus of fiction (novels). The texts in our corpora are English, but our methods aim to be also applicable to corpora in other languages. The research questions we try to answer with the proposed methods are whether different kinds of novels (“subgenres”) can be distinguished from each other in their use of linguistic features, and what the results of the computational methods can reveal to researchers to assist them in a renewed qualitative analysis of the texts or in phrasing new hypotheses for further research. Code and data are available at https://github.com/andreasvc/fictiongenres/
KW - digital humanities
KW - genre
KW - fiction
KW - computational literary studies
UR - http://doi.org/10.4324/9781003335603-6
M3 - Chapter
SN - 9781032371610
T3 - Routledge research in language and communication
SP - 135
EP - 167
BT - Multidisciplinary Views on Discourse Genre
A2 - Stukker, Ninke
A2 - Bateman, John A.
A2 - McNamara, Danielle
A2 - Spooren, Wilbert
PB - Routledge
ER -