Using Python to query, manipulate and publish XML/TEI editions

Onderzoeksoutput: Bijdrage aan conferentiePaperWetenschappelijk


TEI/XML[1] is a set of guidelines and data format that is maintained by the Text Encoding Initiative (TEI) for creating digital editions. It has become an important tool for creating digital editions and as such it plays a major role within the Dixit project. Creation of transcriptions using TEI/XML is a well covered subject. Numerous workshops have been given training participants how to model and encode data. Furthermore there are tools available to aid this process, such as the Oxygen XML editor[2], that are well known and mature.

How to easily query and publish the data contained in TEI/XML files is a more open question. The currently proposed solutions are the use of XSLT[3] for publishing and the use of XQuery[4] (with for example the ExistDB XML database[5]) programming languages for querying[6].

For people not well versed into programming learning XSLT and XQuery can be quite difficult. Programming itself is not easy to learn, but there are specific aspects of XSLT and XQuery that further increase the difficulty. XSLT and XQuery are functional programming languages rather than imperative (as for example C, Python and Ruby are). The code of imperative programming languages consist of commands that are executed by the computer in order, which is easier to understand for a beginner. Furthermore XSLT files are itself XML files which are rather verbose compared to the plain text files used in other programming languages which is more compact and therefore it is easier to grasp the goal of the code.

This presentation will propose to use of the Python programming language[7] to allow editors to query, manipulate and publish the data contained with the TEI/XML as an alternative to XSLT and XQuery.

Python is an imperative and dynamically typed programming language. These aspects make it relatively easy to learn. It has gained considerable popularity in recent years within the digital humanities[8].

Through this presentation the presenter hopes to inspire attendees to learn programming in general and Python in particular or otherwise generate debate about the different approaches.
Originele taal-2Engels
StatusIn voorbereiding - 18 sep. 2015
EvenementDiXiT: Technology, Software, Standards for the Digital Scholarly Edition - Huygens ING, The Hague, Nederland
Duur: 14 sep. 201518 sep. 2015


ConferentieDiXiT: Technology, Software, Standards for the Digital Scholarly Edition
StadThe Hague


Duik in de onderzoeksthema's van 'Using Python to query, manipulate and publish XML/TEI editions'. Samen vormen ze een unieke vingerafdruk.

Citeer dit