Refining and Linking Messy Data

From Master Projects
Jump to: navigation, search


About Refining and Linking Messy Data

  • This project has not yet been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:AI and Communication,

Bioinformatics, Cognitive Science, Computational Intelligence and Selforganisation, Computer Science and Communication, Computer Systems and Security, Formal Methods and Software Verification, High Performance Distributed Computing, Human Ambience, Information Sciences, Information and Communication Technology, Internet and Web Technology, Knowledge Technology and Intelligent Internet Applications, Multimedia, Parallel and Distributed Computer Systems, Software Engineering, Systems Biology, Technical Artificial Intelligence,|, |xXx|project within::xXx|,}}


Description

Problem: Data on the web is often messy, not properly structured, and thereby difficult to load and process automatically. We therefore need methods to semi-automatically structure existing messy data in an accurate fashion in order to integrate and analyze these data afterwards, and publish and share these cleaned data reliably.

Solution/Method: Existing tools like OpenRefine (http://openrefine.org/) allow users to clean messy data. We can combine this tool with the nanopublication approach (http://nanopub.org) to publish and share data in a provenance-aware manner. In this project, a plugin should be developed and evaluated that semi-automatically creates nanopublications out of messy data, together with provenance and meta information about the applied transformation for maximized transparency and reproducibility.