Scientific, technical and medical knowledge is built on research data. It increasingly plays a similar role in the social sciences and humanities. Research datasets are either deposited by researchers or automatically extracted from publications. We propose to create open source search and recommendation solutions for research datasets so as to enable their re-use.
The main benefit is that datasets can be more easily found. This way, data re-use is stimulated and redundancy in data collection is avoided. Situated at the interface between the philosophy of science and computer science, the development of innovative algorithmic solutions will be informed by combining three perspectives. First, we will examine the use of datasets in publications, in different disciplines, and for different
research tasks, to understand to which extent scientific discovery is based on data-availability and how it is affected by data-sharing cultures. Second, we will contribute semantic technologies to support dataset search, to match research data with user groups, and to generate research dataset search engine result pages. Third, we will develop information retrieval algorithms for unsupervised dataset search and
predicting user interactions with dataset search engine results. We will combine these into a self-learning method for searching datasets. Our solutions will be implemented in Elsevier’s retrieval and recommendation environments.
The project will engage the data science community through co-design workshops at critical stages in the research planning, through regular participation in data science and search engine meetups, and by releasing its algorithmic solutions as open source.