Croissant Format Specification

Omar Benjelloun, Elena Simperl, Pierre Marcenac, Pierre Ruyssen, Costanza Conforti, Michael Kuchnik, Jos van der Velde, Luis Oala, Steffen Vogler, Mubashara Akthar, Nitisha Jain, Vyacheslav Tykhonov

Onderzoeksoutput: Andere bijdrageWetenschappelijk

Samenvatting

Datasets are the basis of machine learning (ML). However, a lack of standardization in the description and semantics of ML datasets has made it increasingly difficult for researchers and practitioners to explore, understand, and use all but a small fraction of popular datasets.

The Croissant metadata format simplifies how data is used by ML models. It provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks such as PyTorch, TensorFlow or JAX. In doing so, Croissant enables the interchange of datasets between ML frameworks and beyond, tackling a variety of discoverability, portability, reproducibility, and responsible AI (RAI) challenges.
Originele taal-2Engels
Mijlpalentype toekennenSpecification (standard) for Machine Learning
StatusGepubliceerd - 01 mrt. 2024

Vingerafdruk

Duik in de onderzoeksthema's van 'Croissant Format Specification'. Samen vormen ze een unieke vingerafdruk.

Citeer dit