D2.6: Ontology of licencing, ownership and conditions of use (V1.0)

Enrico Daga, Jason Carvalho, Marco Gurrieri, Andrea Scharnhorst

Onderzoeksoutput: Boek/RapportRapportWetenschappelijk

Samenvatting

In research workflows under the paradigm of Open Science (standing for reproducibility of research, open access toknowledge, and societal responsibility of research) licences play an increasing role. With digitisation and automaticinformation processing, licences become important to also to guide the actions of machines, for example, in sup-porting the exploration and selection of resources and auditing their fair reuse. In the context of Polifonia we dealprimarily with licences which come with content provided in the public sphere by cultural heritage institutions. But,we are also dealing with other source material: for instance information scrapped from websites, and we produceand re-use software which also comes with a licence, such as the resources catalogued by the musoW registry ofmusical resources on the Web. There are various issues when it comes to licences:
- there is a large variety of licences and copyright statementsused in the domain of musical content
- the information about licences is not always added to metadata or not addedin a standardised way, but often ’hidden’ in plain text on websites
- licences regulating the access to and use ofa webservices (e.g., repositories) and licences regulating the access and use of content provided via webservices(e.g. datasets in a repository) are kind of entangled
- there might be various, sometimes contradicting each other,licence information available for a certain data collection.
In this deliverable, we focus on the problem of extracting licence information from Web resources. More specifically,we look into the coverage of licence metadata in data registries, such as musoW a catalogue in which all main datacomponents used by Polifonia are registered, next to a large number other sources. We set up piplines to check forlicence information, and where possible to enrich it, text-mining the original websites/soruces to which the cataloguerefers. We do so with the aid of Large Language Models (LLM). LLMs are receiving increasing attention in numerous applications, including knowledge extraction, but little work has been done so far in extracting and linking licence information with help of them. Working with semantic web principles as our core technology means, we are inparticular devoted to design workflows where licence information can be turned into structured data (best expressedas so-called semantic artefacts); expressed in form of ontologies and knowledge graphs. As a result, we develop iterative workflows where LLM use is combined with querying structured information as coded in ontologies and knowledge graphs. We depart from the source material we use in Polifonia and start with an overview about the Polifonia datasets with the aim to define our problem space (Chapter 2).
We devote an entire chapter to discuss related work (Chapter 3) through which we render the possible solution space. Here, we briefly summarise the current discourse amongthose who further detail rules for FAIR implementation (Section 3.1); we describe the current state of art if it comesto the knowledge representation for licences and terms of use (Secion 3.2). We give an overview of the prominentapproaches to licence expressions on the Web: MPEG, CC-REL, and ODRL. The latter we further address in Section 3.3. We further touch upon the problem of reasoning with licences on the Web (Section 3.4). We close this chapter with a description of the use of Large Language Models - as we will apply them later on in our workflows (3.5).
Chapter 4 concerns a specific workflow how to extract licence information from web resources with a Large Language Model (LLM). Here, we concentrate on the musoW resource which entails many relevant resources including all Polifonia data components which are also registered in the Polifonia Research Ecosystem. The workflow leads toan enrichment of the original licence information available in themusoWcatalogue.
Chapter 5 deals with the task of Knowledge Graph Construction. It entails our design to formalise the extracted information and align it to existingknowledge graphs, and in particular how to best integrate the data into aLicences Knowledge Graph, a core outputof this deliverable. As one result we also gain a deeper insights in which licences are usedin the wild, meaning inthe practices of many musicologists and music documentalists. Therefore, the chapter is followed by an Evaluation (6), complemented by the Polifonia Fair Section (7) - the check list if we oblige to the FAIR guides agreed upon in Polifonia. We conclude the deliverable with Chapter 8.The implications of the work done in this deliverable (D2.6), for the further development of the Polifonia Ecosystem, in particular, how these results influence how we treat licences in the Polifonia Research Ecosystem (and the related framework), will be discussed and reported in the Final Data Management Plan (D7.3).
Originele taal-2Engels
UitgeverijZenodo
Aantal pagina's47
DOI's
StatusGepubliceerd - 01 jan. 2024

Vingerafdruk

Duik in de onderzoeksthema's van 'D2.6: Ontology of licencing, ownership and conditions of use (V1.0)'. Samen vormen ze een unieke vingerafdruk.

Citeer dit