Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing

From Master Projects
Revision as of 10:01, 16 October 2015 by Svk360 (talk | contribs)

Jump to: navigation, search

has title::Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing
status: ongoing
Student name: student name::Sanne Vrijenhoek
Start start date:=2015/02/01
End end date:=2015/10/31
Supervisor: Marieke van Erp
Second supervisor: Stefan Schlobach
Second reader: has second reader::Piek Vossen
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Posternaam.pdf

Signature supervisor



In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. The first step in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded, or dark, entity. After the set of properties has been established we use a method inspired by Hearst patterns and the natural language on the web to try and extract the correct information. The performance of the method is measured using a qualitative analysis. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.