TY - JOUR
T1 - Exploring Data Provenance in Handwritten Text Recognition Infrastructure: Sharing and Reusing Ground Truth Data, Referencing Models, and Acknowledging Contributions.
T2 - Starting the Conversation on How We Could Get It Done
AU - Romein, C. A. (Annemieke)
AU - Hodel, Tobias
AU - Gordijn, Femke
AU - Zundert, Joris J. van
AU - Chagué, Alix
AU - Lange, Milan van
AU - Jensen, Helle Strandgaard
AU - Stauder, Andy
AU - Purcell, Jake
AU - Terras, Melissa M.
AU - Heuvel, Pauline van den
AU - Keijzer, Carlijn
AU - Rabus, Achim
AU - Sitaram, Chantal
AU - Bhatia, Aakriti
AU - Depuydt, Katrien
AU - Afolabi-Adeolu, Mary Aderonke
AU - Anikina, Anastasiia
AU - Bastianello, Elisa
AU - Benzinger, Lukas Vincent
AU - Bosse, Arno
AU - Brown, David
AU - Charlton, Ash
AU - Dannevig, André Nilsson
AU - Gelder, Klaas van
AU - Go, Sabine C.P.J.
AU - Goh, Marcus J.C.
AU - Gstrein, Silvia
AU - Hasan, Sewa
AU - Heide, Stefan von der
AU - Hindermann, Maximilian
AU - Huff, Dorothee
AU - Huysman, Ineke
AU - Idris, Ali
AU - Keijzer, Liesbeth
AU - Kemper, Simon
AU - Koenders, Sanne
AU - Kuijpers, Erika
AU - Larsen, Lisette Rønsig
AU - Lepa, Sven
AU - Link, Tommy O.
AU - Nispen, Annelies van
AU - Nockels, Joe
AU - Noort, Laura M. van
AU - Oosterhuis, Joost Johannes
AU - Popken, Vivien
AU - Puertollano, María Estrella
AU - Puusaag, Joosep J.
AU - Sheta, Ahmed
AU - Stoop, Lex
AU - Strutzenbladh, Ebba
AU - van der Sijs, Nicoline
AU - van der Trouw, Jan Paul
AU - Benaissa, Barry
AU - Vučković, Vladimir
AU - Wilbrink, Heleen
AU - Weiss, Sonia
AU - Wrisley, David Joseph
AU - Zweitstra, Riet
PY - 2023/3/24
Y1 - 2023/3/24
N2 - This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to we want to suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.
AB - This paper discusses best practices for sharing and reusing Ground Truth in Handwritten Text Recognition infrastructures, as well as ways to reference and acknowledge contributions to the creation and enrichment of data within these systems. We discuss how one can place Ground Truth data in a repository and, subsequently, inform others through HTR-United. Furthermore, we want to we want to suggest appropriate citation methods for HTR data, models, and contributions made by volunteers. Moreover, when using digitised sources (digital facsimiles), it becomes increasingly important to distinguish between the physical object and the digital collection. These topics all relate to the proper acknowledgement of labour put into digitising, transcribing, and sharing Ground Truth HTR data. This also points to broader issues surrounding the use of machine learning in archival and library contexts, and how the community should begin to acknowledge and record both contributions and data provenance.
KW - Referencing
KW - Sharing models
KW - HTR
KW - CRedIT
U2 - 10.5281/ZENODO.7267245
DO - 10.5281/ZENODO.7267245
M3 - Article
JO - Zenodo
JF - Zenodo
ER -