Text-fabric: handling Biblical data with IKEA logistics

Activiteit: Toespraak of presentatieAcademisch


The BHSA (Biblia Hebraica Stuttgartensia Amstelodamensis) is the BHS text plus the linguistic annotations of the Eep Talstra Centre for Bible and Computer, formerly WIVU.
The BHSA is available as a data set in Text-Fabric format.
Text-Fabric is a minimalistic model to represent text: it provides addresses for all textual objects, so that it is easy to add arbitrary information at all textual levels, precisely and firmly anchored.
A Text-Fabric resource resembles an IKEA ware house. The parts are nicely separated and stacked, so that they can be retrieved easily, to be combined into meaningful output later on.
A consequence is that different teams with divergent purposes still can add to the same body of work, with a minimum of interference or duplication of work.

Text-Fabric has helped to
1. construct the website SHEBANQ
2. convert to and from other formats (Emdros, XML)
3. compute phonetic representations
4. compute parallel passages
5. compute verbal valence
6. compare different versions of the BHSA

Here we focus on two recent data combination jobs:

(A) We make a detailed comparison of the BHSA and the results of the Open Scriptures Morphology effort. As the OSM is not yet finished, the comparison is the output of a chunk of code.
(B) We generate treebank representations from the BHSA data and add them back as features and then play around with them.
Periode22 mrt. 2018
Mate van erkenningInternationaal