Data Archiving and Networked Services (DANS)

    Research areas

  • BC Logic
  • BL Religion
  • DE The Mediterranean Region. The Greco-Roman World
  • D051 Ancient History
  • P Philology. Linguistics
  • PI Oriental languages and literatures
  • PJ Semitic
  • ZA4050 Electronic information resources
  • ZA4450 Databases
  • Z665 Library Science. Information Science

About

Long-term interest

My long-term interest is the long term in information technology. How can we preserve our digital work for decades and more, while the digital world itself is reinventing itself every decade? As a researcher at DANS, I am primarily interested in solutions that work for humanities scholars. It is far from easy to develop solutions that actually help scholars to focus on their data. All to often IT solutions decrease the control that they have over their own data.

Current work

My specialty is in ancient text corpora. How can scholars conduct digital research on such corpora in a reproducible way? The challenge is to find functional solutions for data preprocessing, text analysis, data sharing and providing provenance metadata. My stance is that it helps if scholars learn programming to a moderate degree, supported by libraries that connect with their way of thinking and operation.

The database of the Hebrew Bible, as compiled by the Eep Talstra Centre for Bible and Computer, was my starting point. I found that they worked with a solid model of text in general. Based on that, I have build the website SHEBANQ,  and out of the tooling I developed to construct it, a new tool arose, Text-Fabric, with wider applications. It is used by scholars to accomodate their workflow of textual research: browsing, searching, analysing, producing results and publishing them, and redistributing the data they contribute.

Currently, Text-Fabric is used for corpora in Hebrew, Syriac, Akkadian, Proto-Cuneiform, and Arabic.

The development of Text-Fabric is done in the open, driven by requirements of domain experts. It is an art to select those IT techniques that give the most return on investment to the researchers. Today, it means that we use Python, not Java, standoff annotation, not inline markup, graph models, not XML, computer memory, not databases. And we make liberal use of GitHub, fortified by long term preservation solutions of Zenodo and the Software Heritage Archive.

Future directions

Text-Fabric as a system is never finished, driven as it is by a growing amount of corpora and scholars with their interests. Yet the system is now mature enough to be useful for a wide range of disciplines, and the biggest challenge is to make new users comfortable with it. Text-Fabric is an innovative collection of functions that need mastering. There is a body of tutorials, but it takes shoulder-to-shoulder work to introduce Text-Fabric to new teams.

Interested?

As everything is done in the open, you can just download and explore the materials and start working.

You can file issues on GitHub, or ask to be invited to one of our Slack teams.

I am happy to help out personally with initial steps and making a first conversion of your corpus into Text-Fabric. An other option is to train you or your programmer to convert corpora on your own. Ultimately, we could submit proposals to do bigger stuff. 

Research output

  1. On Open Access to Research Data: Experiences and reflections from DANS

    Research output: Contribution to journal/periodicalArticleScientificpeer-review

  2. Visual Analytics of the DARIAH in-kind contributions

    Research output: Contribution to conferenceAbstractScientific

  3. Text-Fabric: version 7.3.5

    Research output: Non-textual formSoftwareScientific

View all (26) »

Activities

  1. Visual Analytics of the DARIAH in-kind contributions

    Activity: Talk or presentationAcademic

  2. Text-fabric: handling Biblical data with IKEA logistics

    Activity: Talk or presentationAcademic

  3. Response: your fathers

    Activity: Talk or presentationAcademic

View all (8) »

ID: 16889