RDFS/OWL reasoning using MapReduce framework

From Master Projects
Jump to: navigation, search

has title::RDFS/OWL reasoning using the MapReduce framework
status: finished
Master: project within::Technical Artificial Intelligence
Student name: student name::Jacopo Urbani
number: student number::1735365
Start start date:=2009/01/15
End end date:=2009/07/01
Supervisor: Eyal Oren
Second reader: has second reader::Frank van Harmelen
Poster: has poster::Media:Media:Posternaam.pdf

Signature supervisor



Abstract KIM 1

In the semantic web world the information is encoded in some specific languages (RDF/RDFS or OWL)and one advantage of encoding the information using these languages consists in the fact that the machines can reason over the information and infer new statements. There are already several working implementations of reasoners available in the market but the amount of information present on the web is simply too much to be handled by single machines programs. In this thesis I will explore the possibility to implement a reasoner using the distributed MapReduce model implemented by the Hadoop framework. A distributed approach to this problem is not trivial because the data is strongly correlated so that it becomes difficult to distribute the computation. The aim is to build a reasoner that can handle semantic data in a web scale using a framework that distributes efficiently the computation on many different machines so that we overcome the limitations of the existing programs.

Abstract KIM 2

Semantic Web data encoded in RDF(S)/OWL allows formal, logical, reasoning. Scalable algorithms are now needed to reason over the growing amounts of openly available data. Existing reasoners scale to some extent, but only on single machines.

In this thesis, we propose some reasoning algorithms based on the MapReduce distributed programming model. They implement a rule-based RDFS and OWL reasoning as a sequence of MapReduce jobs. The algorithms were evaluated on various real-world data sets using various number of compute nodes. After several optimisations our RDFS reasoner outperformed all known approaches with respect to reasoning speed; the results show linear scalability with respect to the input data and good speedup efficiency. We also present a distributed algorithm for the computationally more complex task of OWL reasoning, based on the same approach. The algorithm is sound and complete but, since no optimisation was done, not yet competitive.