Dataset linkage recommendation on the Web of Data
abstract:
We address the problem of, given a particular dataset, which candidate dataset(s) from the Web of Data have the highest chance of holding co-references, in order to increase the efficiency of co- reference resolution. Currently, data publishers manually discover and select the right dataset to perform a co-reference resolution. However, in the near future the size of the Web of Data will be such that data publishers can no longer determine which datasets are candidate to map to. A solution for this problem is finding a method to automatically recommend a list of candidate datasets from the Web of Data and present this to the data publisher as an input for the mapping. We proposed two solutions to perform the dataset linkage recommendation. The general idea behind our solutions is predicting the chance a particular dataset on the Web of Data holds co-references with respect to the dataset from the data publisher. This prediction is done by generating a profile for each dataset from the Web of Data. A profile is meta-data that represents the structure of a dataset, in terms of used vocabularies, class types, and property types. Subsequently, dataset profiles that correspond with the dataset profile from the data publisher, get a specific weight value. Datasets with the highest weight values have the highest chance of holding co-references.
supervisor: Paul Groth second supervisor: Christophe Gueret