Activities per year
Abstract
Data is vital for machine learning (ML), yet managing it remains a challenge. We present Croissant, a metadata format that standardizes dataset representation across ML tools, frameworks, and platforms. Croissant enhances dataset discoverability, portability, and interoperability, already supporting hundreds of thousands of datasets in popular repositories. It allows seamless integration with widely-used ML frameworks regardless of data storage location. Human evaluations confirm Croissant's metadata as readable, concise, and complete. The vision is a shared Data Lake enabling federated search across platforms like Dataverse, Kaggle, and HuggingFace. A centralized approach focuses on standardization and repository-level harmonization, while a distributed approachadvocates agile, Linked Data-based solutions that empower diverse communities to integrate within a Distributed Data Network using Croissant ML and AI technologies.
| Original language | English |
|---|---|
| Type | Presentation |
| Publisher | Schloss Dagstuhl - Leibniz-Zentrum für Informatik |
| Number of pages | 27 |
| DOIs | |
| Publication status | Published - 12 Oct 2024 |
Fingerprint
Dive into the research topics of 'Croissant: Metadata for Machine Learning Systems'. Together they form a unique fingerprint.Activities
- 1 Talk or presentation
-
Croissant: Metadata for Machine Learning Systems
Tykhonov, V. (Speaker) & Giner Miguelez, J. (Speaker)
12 Oct 2024Activity: Talk or presentation › Academic