The use of computational methods in humanities research is gaining popularity and leading to new insights. But as we move from distant reading methods to deeper language understanding, we find that many state-of-the-art language technology tools don't behave quite as advertised in publications. The corpora humanities scholars investigate display a wide range of language phenomena, plus humanities scholars do not necessarily have the same goals when they apply these language technology tools as the computational linguists who developed them. The variety in time span, genre, digitisation quality and corpus heterogeneity show the gap between the two research domains.
In this talk, I will discuss several projects in which we needed to address the mismatch between language technology tools and the humanities research objectives, and how we can go forward in fitting our computational methods to the diversity of humanities research questions.