Word2Vec Models Dutch Newspapers

  • Melvin Wevers (Creator)

Dataset

Description

Word Embedding models trained on 6 national Dutch newspapers.

We use the Gensim implementation of Word2Vec to train four embedding models per newspaper, each representing one decade between 1950 and 1990. The models were trained using C-BOW with hierarchical softmax, with a dimensionality of 300, a minimal word count and context of 5, and downsampling of 10-5

These models belong to the article: Using Word Embeddings to Examine Gender Bias in Dutch Newspapers, 1950-1990
Date made available03 Jun 2019
PublisherZenodo

Keywords

  • Word embeddings
  • newspapers

Dataset type

  • Processed data

Cite this