Abstract
The Croissant metadata format simplifies how data is used by ML models and provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks, including specification on Responsible AI and reproducibility. Croissant format export is going to be added in the Dataverse data repository and available for the whole Dataverse network. In this presentation, we will explore the innovation path for the rapid evolution of the Croissant standard. We will demonstrate how to implement all changes without modifying the source code, utilizing a novel approach known as FAIR semantic mappings. There is a significant intersection between Croissant and the DDI and DDI-CDI standards, which are commonly used in the Social Sciences and Humanities, and already supported by Dataverse and other data repositories. In this presentation, we'll explore how these standards can assist the Machine Learning community in mitigating bias in data."
Original language | English |
---|---|
Type | Presentation |
Publisher | Zenodo |
Number of pages | 21 |
DOIs | |
Publication status | Published - 20 Mar 2024 |