RDF Reasoning: A parallel graph algorithm

From Master Projects
Jump to: navigation, search

About RDF Reasoning: A parallel graph algorithm

  • This project has not yet been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:Knowledge Technology and Intelligent Internet Applications, Internet and Web Technology, High Performance Distributed Computing, Parallel and Distributed Computer Systems|, |xXx|project within::xXx|,}}


Reasoning on RDF (Sematic Web) data is a very data intensive problem. Billions of RDF statements are available online, and processing them on a single machine can take weeks, if possible at all. RDF reasoning can be seen as a graph problem. The goal of this project is to implement a RDF reasoner (possibly partly based on our WebPIE MapReduce RDF reasoner) in HipG, our distributed, high-performance, large-scale graph processing framework.


Supporting the development of parallel programs operating on large- scale graphs is becoming critical with the increasing abundance of huge real- world graphs. Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. HipG is a distributed framework that facilitates high-level programming of parallel graph algorithms. Graph algorithms are expressed using notions of vertices and edges. The user defines small pieces of work executed at vertices and HipG automatically parallelizes such application on a distributed memory machine.

For more information, see the HipG page: http://www.few.vu.nl/~e.krepska/hipg/


The Semantic Web extends the World Wide Web by providing well-defined semantics to information and services. Through these semantics machines can “understand” the Web, making it possible to query and reason over Web information, treating the Web as if it were a giant semi-structured database. Over the recent years, large volumes of data have been published in a Semantic Web format, constituting a valuable resource for researchers across several fields: in medicine, there are dozens of datasets that comprise protein sequence and functional information, biomedical article citations, gene information and more1. The US and UK governments are putting major effort in making public information more accessible by providing it in Semantic Web formats 2. General knowledge extracted from Wikipedia 3 and geographical knowledge 4 is also available. Semantic Web data is expressed in statements, also known as triples. Available data is quickly outgrowing the computational capacity of single machines and of standard indexing techniques. In March 2009, around 4 billion statements were available. In the following 9 months this number had tripled to 13 billion statements and the growth continues.

A statement consists of a sequence of three terms: subject, predicate and object.

An example is: <http://www.vu.nl> <rdf:type> <http://dbpedia.org/University>

This statement states that the concept http://www.vu.nl is of type http://dbpedia.org/University.

Machines can reason using these triples, inferring new statements from the existing statements. This is usually accomplished by applying a set of rules. The most commonly used rulesets are the RDFS ruleset and the OWL-horst ruleset, with the second being more powerful and more difficult to implement. Rules are typically applied recursively, adding the inferred statements to the input and stopping only when the closure is reached (i.e., no further conclusions can be derived).

For more information, see the WebPIE page: http://www.cs.vu.nl/webpie


Ideally, a student willing to take up this project should be strong in Java programming and in parallel and distributed computing. The relevant courses are: Parallel Programming; Parallel Programming Practical; Grid and Cluster Computing; Distributed Algorithms.


This project's supervisors: