Beschrijving
The ETCBC database of the Hebrew Bible (formerly known as WIVU database), contains the scholarly text of the Hebrew Bible with linguistic markup.
A previous version can be found in EASY (see the link below).
The present dataset is an improvement in many ways:
(A) it contains a new version of the data, called ETCBC4.
The content has been heavily updated, with new linguistic annotations and a better organisation of them, and lots of additions and corrections as well.
(B) the data format is now Linguistic Annotation Framework (see below). This contrasts with the previous version, which has been archived as a database dump in a specialised format: Emdros (see the link below).
(C) a new tool, LAF-Fabric is added to process the ETCBC4 version directly from its LAF representation. The picture on this page shows a few samples what can be done with it.
(D) extensive documentation is provided, including a description of all the computing steps involved in getting the data in LAF format.
Since 2012 there is an ISO standard for the stand-off markup of language resources, Linguistic Annotation Framework (LAF).
As a result of the SHEBANQ project (see link below), funded by CLARIN-NL and carried out by the ETCBC and DANS, we have a created a tool, LAF-Fabric, by which we can convert EMDROS databases of the ETCBC into LAF and then do data analytic work by means of e.g. IPython notebooks.
This has been used for the Hebrew Bible, but it can also be applied to the Syriac text in the CALAP (see link below).
This dataset contains a folder laf with the laf files, and the necessary declarations are contained in the folder decl.
Among these declarations are feature declaration documents, in TEI format (see link below), with hyperlinks to concept definitions in ISOcat (see link below).
For completeness, the ISOcat definitions are repeated in the feature declaration documents.
These definitions are terse, and they are more fully documented in the folder documentation.
A previous version can be found in EASY (see the link below).
The present dataset is an improvement in many ways:
(A) it contains a new version of the data, called ETCBC4.
The content has been heavily updated, with new linguistic annotations and a better organisation of them, and lots of additions and corrections as well.
(B) the data format is now Linguistic Annotation Framework (see below). This contrasts with the previous version, which has been archived as a database dump in a specialised format: Emdros (see the link below).
(C) a new tool, LAF-Fabric is added to process the ETCBC4 version directly from its LAF representation. The picture on this page shows a few samples what can be done with it.
(D) extensive documentation is provided, including a description of all the computing steps involved in getting the data in LAF format.
Since 2012 there is an ISO standard for the stand-off markup of language resources, Linguistic Annotation Framework (LAF).
As a result of the SHEBANQ project (see link below), funded by CLARIN-NL and carried out by the ETCBC and DANS, we have a created a tool, LAF-Fabric, by which we can convert EMDROS databases of the ETCBC into LAF and then do data analytic work by means of e.g. IPython notebooks.
This has been used for the Hebrew Bible, but it can also be applied to the Syriac text in the CALAP (see link below).
This dataset contains a folder laf with the laf files, and the necessary declarations are contained in the folder decl.
Among these declarations are feature declaration documents, in TEI format (see link below), with hyperlinks to concept definitions in ISOcat (see link below).
For completeness, the ISOcat definitions are repeated in the feature declaration documents.
These definitions are terse, and they are more fully documented in the folder documentation.
| Datum van beschikbaarheid | 29 jul. 2014 |
|---|---|
| Uitgever | Data Archiving and Networked Services (DANS) |
| Tijdelijke dekking | 1000 |
| Datum van data-aanmaak | 2013 - 2014 |
| Geografische dekking | Israel |
Dataset type
- Verwerkte data
-
The Hebrew Bible as Data: Laboratory - Sharing - Experiences
Roorda, D., 28 dec. 2017, CLARIN in the Low Countries. Odijk, J. & van Hessen, A. (uitgave). London: Ubiquity Press Limited, blz. 217-229 13 blz.Onderzoeksoutput: Hoofdstuk in boek/boekdeel › Hoofdstuk › Wetenschappelijk › peer review
Open AccessBestand159 Downloads (Pure) -
LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible
Roorda, D., Kalkman, G., Naaijer, M. & Cranenburgh, A. V., 20 dec. 2014, In: Computational Linguistics in the Netherlands Journal. 4, blz. 105-120 16 blz.Onderzoeksoutput: Bijdrage aan wetenschappelijk tijdschrift/periodieke uitgave › Artikel › Wetenschappelijk › peer review
Open AccessBestand -
LAF-Fabric: Data processing for Linguistic Annotation Framework
Roorda, D., 2014Onderzoeksoutput: Niet-tekstuele vorm › Software › Wetenschappelijk
Open Access
Citeer dit
- DataSetCite