Croissant ML standard in the context of Dataverse, EOSC and beyond

Vyacheslav Tykhonov, Philip Durbin

Research output: Other contributionScientific

Abstract

The Croissant metadata format simplifies how data is used by ML models and provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks, including specification on Responsible AI and reproducibility. Croissant format export is going to be added in the Dataverse data repository and available for the whole Dataverse network. In this presentation, we will explore the innovation path for the rapid evolution of the Croissant standard. We will demonstrate how to implement all changes without modifying the source code, utilizing a novel approach known as FAIR semantic mappings. There is a significant intersection between Croissant and the DDI and DDI-CDI standards, which are commonly used in the Social Sciences and Humanities, and already supported by Dataverse and other data repositories. In this presentation, we'll explore how these standards can assist the Machine Learning community in mitigating bias in data."
Original languageEnglish
TypePresentation
PublisherZenodo
Number of pages21
DOIs
Publication statusPublished - 20 Mar 2024

Fingerprint

Dive into the research topics of 'Croissant ML standard in the context of Dataverse, EOSC and beyond'. Together they form a unique fingerprint.

Cite this