Difference between revisions of "Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing"

From Master Projects
Jump to: navigation, search
Line 11: Line 11:
 
|Poster=Posternaam.pdf
 
|Poster=Posternaam.pdf
 
}}
 
}}
In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. Key in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded entity. After the set of properties has been established we use Hearst patterns and the natural language on the web to try and extract the correct information. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.
+
In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. The first step in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded, or dark, entity. After the set of properties has been established we use a method inspired by Hearst patterns and the natural language on the web to try and extract the correct information. The performance of the method is measured using a qualitative analysis. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.

Revision as of 10:01, 16 October 2015


has title::Illuminating Dark Entities: a study on information discovery using Semantic Web and Natural Language Processing
status: ongoing
Student name: student name::Sanne Vrijenhoek
Dates
Start start date:=2015/02/01
End end date:=2015/10/31
Supervision
Supervisor: Marieke van Erp
Second supervisor: Stefan Schlobach
Second reader: has second reader::Piek Vossen
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Posternaam.pdf

Signature supervisor



..................................

Abstract

In order to understand human-produced written text processes have been developed that extract information such as entities and events from text. Recent developments aim to ground the found entities into resources on the Semantic Web, but sometimes this fails because there is no readily found resource representing the entity. This study aims to populate the knowledge base by extracting information about the entities from other sources containing natural language. The first step in this process is to identify the set of properties relevant or descriptive of the entity in question. We investigate three different ways to go about this; firstly by manually constructing a list, secondly by filling the list of most common properties found and lastly by considering the properties of entities co-occurring with the ungrounded, or dark, entity. After the set of properties has been established we use a method inspired by Hearst patterns and the natural language on the web to try and extract the correct information. The performance of the method is measured using a qualitative analysis. The use-case of this project are the entities found in the NewsReader project, which processes news articles from numerous sources around the web with a focus on the financial and economic domain. It contains about 80,000 entities, of which roughly half are linked and half are not.