Talking about odors and flavors is difficult for most people, yet experts appear to be able to convey critical information about wines in their reviews. This seems to be a contradiction, and wine expert descriptions are frequently received with criticism. Here, we propose a method for probing the language of wine reviews, and thus offer a means to enhance current vocabularies, and as a by-product question the general assumption that wine reviews are gibberish. By means of two different quantitative analyses—support vector machines for classification and Termhood analysis—on a corpus of online wine reviews, we tested whether wine reviews are written in a consistent manner, and thus may be considered informative; and whether reviews feature domain-specific language. First, a classification paradigm was trained on wine reviews from one set of authors for which the color, grape variety, and origin of a wine were known, and subsequently tested on data from a new author. This analysis revealed that, regardless of individual differences in vocabulary preferences, color and grape variety were predicted with high accuracy. Second, using Termhood as a measure of how words are used in wine reviews in a domain-specific manner compared to other genres in English, a list of 146 wine-specific terms was uncovered. These words were compared to existing lists of wine vocabulary that are currently used to train experts. Some overlap was observed, but there were also gaps revealed in the extant lists, suggesting these lists could be improved by our automatic analysis.

Original languageEnglish
Pages (from-to)1-20
JournalNatural Language Engineering
DOI
Publication statusPublished - 23 Sep 2019

    Research areas

  • corpus linguistics, information extraction, semantics, statistical methods, wine expertise

ID: 11696457