Semantic Similarity and Correlation of Linked Statistical Data Analysis

Sarven Capadisli, Albert Meroño-Peñuela, Sören Auer, Reinhard Riedl

Onderzoeksoutput: Bijdrage aan conferentiePaperWetenschappelijkpeer review

30 Downloads (Pure)


Statistical data is increasingly made available in the form of Linked Data on the Web. As more and more statistical datasets become available, a fundamental question on statistical data comparability arises: To what extent can arbitrary statistical datasets be faithfully compared? Besides a purely statistical comparability, we are interested in the role that semantics plays in the data to be compared. Our hypothesis is that semantic relationships between different components of statistical datasets might have a relationship with their statistical correlation. Our research focuses in studying whether these statistical and semantic relationships influence each other, by comparing the correlation of statistical data with their semantic similarity. The ongoing research problem is, hence, to investigate why machines have a difficulty in revealing meaningful correlations or establishing non-coincidental connection between variables in statistical datasets. We describe a fully reproducible pipeline to compare statistical correlation with semantic similarity in arbitrary Linked Statistical Data. We present a use case using World Bank data expressed as RDF Data Cube, and we highlight whether dataset titles can help predict strong correlations.
Originele taal-2Engels
StatusGepubliceerd - 2014
Evenement2nd International Workshop on Semantic Statistics (SemStats 2014) - International Semantic Web Conference, ISWC 2014, Riva del Garda, Italië
Duur: 19 okt. 2014 → …


Conferentie2nd International Workshop on Semantic Statistics (SemStats 2014)
StadRiva del Garda
Periode19/10/2014 → …


Duik in de onderzoeksthema's van 'Semantic Similarity and Correlation of Linked Statistical Data Analysis'. Samen vormen ze een unieke vingerafdruk.

Citeer dit