TY - UNPB
T1 - Croissant: A Metadata Format for ML-Ready Datasets
AU - Akthar, Mubashara
AU - Benjelloun, Omar
AU - Conforti, Costanza
AU - Giner Miguelez, Joan
AU - Jain, Nitisha
AU - Kuchnik, Michael
AU - Lhoest, Quentin
AU - Marcenac, Pierre
AU - Maskey, Manil
AU - Mattson, Peter
AU - Oala, Luis
AU - Ruyssen, Pierre
AU - Shinde, Rajat
AU - Simperl, Elena
AU - Thomas, Goeffry
AU - Tykhonov, Vyacheslav
AU - Vanschoren, Joaquin
AU - Vogler, Steffen
AU - Wu, Carole-Jean
N1 - 2403.19546v1
PY - 2024/3/28
Y1 - 2024/3/28
N2 - Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
AB - Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
KW - machine learning
KW - artificial intelligence
M3 - Preprint
BT - Croissant: A Metadata Format for ML-Ready Datasets
PB - Arxiv.org
ER -