What's wrong with Information Extraction?

From Master Projects
Jump to: navigation, search


About What's wrong with Information Extraction?

  • This project has not yet been fulfilled.
  • This project fits in the following Bachelor programs: {{#arraymap:Computer Science|, |xXx|bachelorproject within::xXx|,}}
  • This project fits in the following masterareas: {{#arraymap:Computer Science and Communication, Information and Communication Technology, High Performance Distributed Computing, Multimedia, Internet and Web Technology, AI and Communication, Technical Artificial Intelligence, Computational Intelligence and Selforganisation, Information Sciences, Parallel and Distributed Computer Systems|, |xXx|project within::xXx|,}}


Description

Enriched video descriptions with relevant concepts such as events, people, organization, locations, time and other concepts could highly improve the search and the retrieval of the video. In literature, this process of gathering video description annotations has been realized already through Information Extraction tools, by recognizing all the relevant named entities in a document. However, named entity extraction is still a challenging problem due to the poor accuracy of the NER tools when extracting the named entities, and especially the events.

The goal of this project is to perform error analysis of existing Information Extraction tools such as NERD, THD, TextRazor, DBPediaSpotlight and identify the textual features of the video descriptions for which the IE tools do not perform well. There are different aspects for which we can compare the performance of the tools:

  • language: English vs. Dutch video descriptions
  • length: short vs. long video descriptions
  • concept identification: event detection vs. entity detection
  • concept type identification: event types vs. entity types
  • confidence scores, relevance scores

Tools, Data, Technologies

Tools

Possible video collections