How much provenance is on the Web?

From Master Projects
Jump to: navigation, search


About How much provenance is on the Web?

  • This project has been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:High Performance Distributed Computing, Internet and Web Technology, Computer Science and Communication|, |xXx|project within::xXx|,}}


Description

Where did that idea come from originally? Who is responsible for quote? Did Rutte actually say that? Did the tweet originate in the USA or Egypt?

Understanding the origins (or provenance) of information on the web is critical for judging its veracity. However, currently we do not know how much provenance information is already on the Web. Furthermore, it is currently difficult to search in a structured fashion. In this project, we will look at extracting and measuring the provenance information available on a massive Web crawl. Furthermore, if possible, we will look at creating a structured search engine for such information.

The student will have an opportunity to work a large scale cluster and Amazon web services.