Data-Envelopes for Cultural Heritage: Going beyond Datasheets

Mrinalini Luthra, Maria Eskevich

Research output: Contribution to conferencePaperScientificpeer-review

2 Citations (Scopus)

Abstract

Cultural heritage data is a rich source of information about the history and culture development in the past. When used with due understanding of its intrinsic complexity it can both support research in social sciences and humanities, and become input for machine learning and artificial intelligence algorithms. In all cases ethical and contextual considerations can be encouraged when the relevant information is provided in a clear and well structured form to potential users before they begin to interact with the data. Proposed data-envelopes, basing on the existing documentation frameworks, address the particular needs and challenges of the cultural heritage field while combining machine-readability and user-friendliness. We develop and test data-envelopes usability on the data from the Huygens Institute for History and Culture of the Netherlands. This paper presents the following contributions: i) we highlight the complexity of CH data, featuring the unique ethical and contextual considerations they entail; ii) we evaluate and compare existing dataset documentation frameworks, examining their suitability for CH datasets; iii) we introduce the “data-envelope”–a machine readable adaptation of existing dataset documentation frameworks, to tackle the specificities of CH datasets. Its modular form is designed to serve not only the needs of machine learning (ML), but also and especially broader user groups varying from humanities scholars, governmental monitoring authorities to citizen scientists and the general public. Importantly, the data-envelope framework emphasises the legal and ethical dimensions of dataset documentation, facilitating compliance with evolving data protection regulations and enhancing the accountability of data stewardship in the cultural heritage sector. We discuss and invite the readers for further conversation on the topic of ethical considerations, and how the different audiences should be informed about the importance of datasets documentation management and their context.
Original languageEnglish
Pages52-65
Number of pages14
Publication statusPublished - May 2024
EventWorkshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024 - Lingotto Conference Centre, Torino, Italy
Duration: 20 May 202420 May 2024
https://legal2024.mobileds.de

Workshop

WorkshopWorkshop on Legal and Ethical Issues in Human Language Technologies @ LREC-COLING 2024
Abbreviated titleLEGAL2024
Country/TerritoryItaly
CityTorino
Period20/05/202420/05/2024
Internet address

Keywords

  • machine-readable datasheets
  • cultural heritage
  • data ethics
  • transparency
  • auditability
  • FAIR

Cite this