Named Entity Disambiguation with two-stage coherence optimization

From Master Projects
Jump to: navigation, search

has title::Named Entity Disambiguation with two-stage coherence optimization
status: ongoing
Student name: student name::Filip Ilievski
Start start date:=2014/10/15
End end date:=2015/02/15
Supervisor: Stefan Schlobach
Second supervisor: Marieke van Erp
Second reader: has second reader::Frank van Harmelen
Company: has company::Computational Lexicology and Terminology Lab, Vrije Universiteit
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Posternaam.pdf

Signature supervisor



Abstract KIM 1

Abstract KIM 2

Millions of news articles are available to the data analysts on a daily basis. According to (High, 2012), 80% of the information in the world is unstructured. We need computers capable to understand this flood of information and handle text in an automated manner. The current techniques for natural language processing tend to utilize semantic coherence and perform a collective interpretation (Hoffart et al., 2011). However, there is a potential to increase the discriminative power of the verbal and social semantics and context. Recently, IBM’s Watson illustrated this potential by combining semantically rich lexical resources with world knowledge from ontologies and datasets (High, 2012). Natural Language Processing can put both types of knowledge to use to enhance the automated interpretation of text. One of the core tasks in NLP is identifying mentions within the text and generating candidate interpretations; the knowledge from Semantic Web and the lexicons can be used to analyze and restrict the possible interpretations. Additionally, while an array of text processing tools is currently used to extract events, recognize and link entities, and discover relations between the two, contemporary NLP modules solve Entity Linking, Event Detection, and Semantic Role Labeling as separate problems. From the semantic point of view, each of these processes adds another brush-stroke onto the canvas of meaning: entities and events occur in relations which correspond to roles.

In this thesis, an approach is presented where such NLP processes are extended with a semantic process of coherence optimization. Our approach aims to make decisions in a collective and global manner, by combining information from multiple NLP processes and diverse background knowledge. We use both binary logic and probabilistic models built through manual and automatic techniques. In the binary filtering phase, we use restrictions from VerbNet and a domain-specific ontology to jointly narrow down the possible interpretations of both the predicates and the entities. However, purely logical techniques are only able to exclude inconsistent joint interpretations, generally resulting in multiple logically coherent results. We argue that a second phase is needed in which the coherence between the remaining candidates is optimized in a probabilistic manner based on available background knowledge about the entities. Our optimization method assigns scores to the entity candidates based on factors such as graph distance, the number of shared properties, popularity metrics, and distance in the class hierarchy.

The proposed solution will be evaluated on a domain-specific gold standard consisting of 90 manually-annotated news articles. Results will be compared to existing entity rankers, such as DBpedia Spotlight.

[1] High, R. (2012). The Era of Cognitive Systems: An Inside Look at IBM Watson and How it Works. Redguites for Business Leaders.

[2] Hoffart, J., Yosef, M. A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., ... & Weikum, G. (2011, July). Robust disambiguation of named entities in text. InProceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 782-792). Association for Computational Linguistics.