Using Python to query, manipulate and publish XML/TEI editions

Research output: Contribution to conferencePaperScientific

Abstract

TEI/XML[1] is a set of guidelines and data format that is maintained by the Text Encoding Initiative (TEI) for creating digital editions. It has become an important tool for creating digital editions and as such it plays a major role within the Dixit project. Creation of transcriptions using TEI/XML is a well covered subject. Numerous workshops have been given training participants how to model and encode data. Furthermore there are tools available to aid this process, such as the Oxygen XML editor[2], that are well known and mature.

How to easily query and publish the data contained in TEI/XML files is a more open question. The currently proposed solutions are the use of XSLT[3] for publishing and the use of XQuery[4] (with for example the ExistDB XML database[5]) programming languages for querying[6].

For people not well versed into programming learning XSLT and XQuery can be quite difficult. Programming itself is not easy to learn, but there are specific aspects of XSLT and XQuery that further increase the difficulty. XSLT and XQuery are functional programming languages rather than imperative (as for example C, Python and Ruby are). The code of imperative programming languages consist of commands that are executed by the computer in order, which is easier to understand for a beginner. Furthermore XSLT files are itself XML files which are rather verbose compared to the plain text files used in other programming languages which is more compact and therefore it is easier to grasp the goal of the code.

This presentation will propose to use of the Python programming language[7] to allow editors to query, manipulate and publish the data contained with the TEI/XML as an alternative to XSLT and XQuery.

Python is an imperative and dynamically typed programming language. These aspects make it relatively easy to learn. It has gained considerable popularity in recent years within the digital humanities[8].

Through this presentation the presenter hopes to inspire attendees to learn programming in general and Python in particular or otherwise generate debate about the different approaches.
Original languageEnglish
Publication statusIn preparation - 18 Sept 2015
EventDiXiT: Technology, Software, Standards for the Digital Scholarly Edition - Huygens ING, The Hague, Netherlands
Duration: 14 Sept 201518 Sept 2015

Conference

ConferenceDiXiT: Technology, Software, Standards for the Digital Scholarly Edition
Country/TerritoryNetherlands
CityThe Hague
Period14/09/201518/09/2015

Fingerprint

Dive into the research topics of 'Using Python to query, manipulate and publish XML/TEI editions'. Together they form a unique fingerprint.

Cite this