Projects per year
Statistical data is increasingly made available in the form of Linked Data on the Web. As more and more statistical datasets become available, a fundamental question on statistical data comparability arises: To what extent can arbitrary statistical datasets be faithfully compared? Besides a purely statistical comparability, we are interested in the role that semantics plays in the data to be compared. Our hypothesis is that semantic relationships between different components of statistical datasets might have a relationship with their statistical correlation. Our research focuses in studying whether these statistical and semantic relationships influence each other, by comparing the correlation of statistical data with their semantic similarity. The ongoing research problem is, hence, to investigate why machines have a difficulty in revealing meaningful correlations or establishing non-coincidental connection between variables in statistical datasets. We describe a fully reproducible pipeline to compare statistical correlation with semantic similarity in arbitrary Linked Statistical Data. We present a use case using World Bank data expressed as RDF Data Cube, and we highlight whether dataset titles can help predict strong correlations.
|Publication status||Published - 2014|
|Event||2nd International Workshop on Semantic Statistics (SemStats 2014) - International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy|
Duration: 19 Oct 2014 → …
|Conference||2nd International Workshop on Semantic Statistics (SemStats 2014)|
|City||Riva del Garda|
|Period||19/10/2014 → …|
- linked data
- semantic similarity
- statistical database
FingerprintDive into the research topics of 'Semantic Similarity and Correlation of Linked Statistical Data Analysis'. Together they form a unique fingerprint.
- 1 Finished
Census data open linked – CEDA_R From fragment to fabric – Dutch census data in a web of global cultural and historic information
01/10/2011 → 31/03/2016
- 1 Talk or presentation