Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing
has title::Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing | |
---|---|
status: finished
| |
Student name: | student name::Sanne Vrijenhoek |
Dates | |
Start | start date:=2015/02/01 |
End | end date:=2015/10/31 |
Supervision | |
Supervisor: | Marieke van Erp |
Second supervisor: | Stefan Schlobach |
Second reader: | has second reader::Tobias Kuhn |
Thesis: | has thesis::Media:Thesis.pdf |
Poster: | has poster::Media:Posternaam.pdf |
Signature supervisor
..................................
Abstract
In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. The first step in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded, or dark, entity. After the set of properties has been established we use a method inspired by Hearst patterns and the natural language on the web to try and extract the correct information. The performance of the method is measured using a qualitative analysis. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.