Structural Properties as Proxy for Semantic Relevance in RDF Graph Sampling

Laurens Rietveld, Rinke Hoekstra, Stefan Schlobach, Christophe Guéret

Onderzoeksoutput: Hoofdstuk in boek/boekdeelBijdrage aan conferentie proceedingsWetenschappelijkpeer review

150 Downloads (Pure)

Samenvatting

The Linked Data cloud has grown to become the largest knowledge base ever constructed. Its size is now turning into a major bottleneck for many applications. In order to facilitate access to this structured information, this paper proposes an automatic sampling method targeted at maximizing answer coverage for applications using SPARQL querying. The approach presented in this paper is novel: no similar RDF sampling approach exist. Additionally, the concept of creating a sample aimed at maximizing SPARQL answer coverage, is unique. We empirically show that the relevance of triples for sampling (a semantic notion) is influenced by the topology of the graph (purely structural), and can be determined without prior knowledge of the queries. Experiments show a significantly higher recall of topology based sampling methods over ran- dom and naive baseline approaches (e.g. up to 90% for Open-BioMed at a sample size of 6%).
Originele taal-2Engels
TitelThe Semantic Web – ISWC 2014
Subtitel13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II
RedacteurenPeter Mika, Abraham Bernstein, Chris Welty, Craig Knoblock, Denny Vrandečić, Paul Groth, Natasha Noy, Krzysztof Janowicz, Carole Goble
UitgeverijSpringer
Pagina's81-96
Aantal pagina's16
Volume8797
ISBN van elektronische versie978-3-319-11915-1
ISBN van geprinte versie978-3-319-11914-4
DOI's
StatusGepubliceerd - 2014

Publicatie series

NaamLecture Notes in Computer Science
ISSN van geprinte versie0302-9743
  • Data2Semantics

    van Harmelen, F.

    01/01/201001/01/2016

    Project: Onderzoek

Citeer dit