Difference between revisions of "Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing"
(New page: {{Masterproject |Student name=Sanne Vrijenhoek |Project start date=2015/02/01 |Project end date=2015/08/31 |Supervisor=Marieke van Erp |Second supervisor=Stefan Schlobach |Second reader=Pi...) |
|||
Line 2: | Line 2: | ||
|Student name=Sanne Vrijenhoek | |Student name=Sanne Vrijenhoek | ||
|Project start date=2015/02/01 | |Project start date=2015/02/01 | ||
− | |Project end date=2015/ | + | |Project end date=2015/10/31 |
|Supervisor=Marieke van Erp | |Supervisor=Marieke van Erp | ||
|Second supervisor=Stefan Schlobach | |Second supervisor=Stefan Schlobach |
Revision as of 09:53, 16 October 2015
has title::Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing | |
---|---|
status: ongoing
| |
Student name: | student name::Sanne Vrijenhoek |
Dates | |
Start | start date:=2015/02/01 |
End | end date:=2015/10/31 |
Supervision | |
Supervisor: | Marieke van Erp |
Second supervisor: | Stefan Schlobach |
Second reader: | has second reader::Piek Vossen |
Thesis: | has thesis::Media:Thesis.pdf |
Poster: | has poster::Media:Posternaam.pdf |
Signature supervisor
..................................
Abstract
In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. Key in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded entity. After the set of properties has been established we use Hearst patterns and the natural language on the web to try and extract the correct information. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.