Activiteiten per jaar
Samenvatting
Data is vital for machine learning (ML), yet managing it remains a challenge. We present Croissant, a metadata format that standardizes dataset representation across ML tools, frameworks, and platforms. Croissant enhances dataset discoverability, portability, and interoperability, already supporting hundreds of thousands of datasets in popular repositories. It allows seamless integration with widely-used ML frameworks regardless of data storage location. Human evaluations confirm Croissant's metadata as readable, concise, and complete. The vision is a shared Data Lake enabling federated search across platforms like Dataverse, Kaggle, and HuggingFace. A centralized approach focuses on standardization and repository-level harmonization, while a distributed approachadvocates agile, Linked Data-based solutions that empower diverse communities to integrate within a Distributed Data Network using Croissant ML and AI technologies.
Originele taal-2 | Engels |
---|---|
Mijlpalentype toekennen | Presentation |
Uitgever | Schloss Dagstuhl - Leibniz-Zentrum für Informatik |
Aantal pagina's | 27 |
DOI's | |
Status | Gepubliceerd - 12 okt. 2024 |
Activiteiten
- 1 Toespraak of presentatie
-
Croissant: Metadata for Machine Learning Systems
Vyacheslav Tykhonov (Speaker) & Joan Giner Miguelez (Speaker)
12 okt. 2024Activiteit: Toespraak of presentatie › Academisch