Croissant Format Specification

Omar Benjelloun, Elena Simperl, Pierre Marcenac, Pierre Ruyssen, Costanza Conforti, Michael Kuchnik, Jos van der Velde, Luis Oala, Steffen Vogler, Mubashara Akthar, Nitisha Jain, Vyacheslav Tykhonov

Research output: Other contributionScientific

Abstract

Datasets are the basis of machine learning (ML). However, a lack of standardization in the description and semantics of ML datasets has made it increasingly difficult for researchers and practitioners to explore, understand, and use all but a small fraction of popular datasets.

The Croissant metadata format simplifies how data is used by ML models. It provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks such as PyTorch, TensorFlow or JAX. In doing so, Croissant enables the interchange of datasets between ML frameworks and beyond, tackling a variety of discoverability, portability, reproducibility, and responsible AI (RAI) challenges.
Original languageEnglish
TypeSpecification (standard) for Machine Learning
Publication statusPublished - 01 Mar 2024

Fingerprint

Dive into the research topics of 'Croissant Format Specification'. Together they form a unique fingerprint.

Cite this