Difference between revisions of "Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing"

From Master Projects
Jump to: navigation, search
(New page: {{Masterproject |Student name=Sanne Vrijenhoek |Project start date=2015/02/01 |Project end date=2015/08/31 |Supervisor=Marieke van Erp |Second supervisor=Stefan Schlobach |Second reader=Pi...)
Line 2: Line 2:
|Student name=Sanne Vrijenhoek
|Student name=Sanne Vrijenhoek
|Project start date=2015/02/01
|Project start date=2015/02/01
|Project end date=2015/08/31
|Project end date=2015/10/31
|Supervisor=Marieke van Erp
|Supervisor=Marieke van Erp
|Second supervisor=Stefan Schlobach
|Second supervisor=Stefan Schlobach

Revision as of 09:53, 16 October 2015

has title::Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing
status: ongoing
Student name: student name::Sanne Vrijenhoek
Start start date:=2015/02/01
End end date:=2015/10/31
Supervisor: Marieke van Erp
Second supervisor: Stefan Schlobach
Second reader: has second reader::Piek Vossen
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Posternaam.pdf

Signature supervisor



In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. Key in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded entity. After the set of properties has been established we use Hearst patterns and the natural language on the web to try and extract the correct information. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.