Taxonomy-based Information Retrieval Applied to the Tax Law Domain

From Master Projects
Jump to: navigation, search


has title::Taxonomy-based Information Retrieval
status: finished
Master: project within::Knowledge Technology and Intelligent Internet Applications
Student name: student name::Andy Wang
Dates
Start start date:=2015/03/01
End end date:=2015/08/31
Supervision
Supervisor: Rinke Hoekstra
Second reader: has second reader::Tobias Kuhn
Company: has company::PwC
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Posternaam.pdf

Signature supervisor



..................................

Abstract

It is often mentioned how 80% of the world’s information is unstructured (George, Haas & Pentland, 2014; Grimes, 2008; High 2012). This makes the information complex for computers to process and understand. Therefore, it also complicates the effectiveness with which search engines are able to retrieve unstructured documents that contain the user’s information need.

Basic keyword matching search is susceptible to the vocabulary problem (e.g. synonymy, polysemy, and inflections). The goal of this thesis is to evaluate whether the use of a domain-specific taxonomy can significantly improve performance of an information retrieval system searching in unstructured domain-specific texts (court rulings concerning tax law) by way of query expansion. The proposed solution tackles the problem from 2 sides:

1) The documents: Through an adaptation of relevance feedback. After firing an initial query, the user selects relevant documents from the results set. Subsequently, the initial query is expanded with taxonomical concepts extracted from the selected relevant documents.

2) The user query: Through interactive query refinement users are guided into being more specific and entering keywords that better fit their need. This is done by implementing concept selection techniques known as auto-complete and ontology browsing.

Performance of the proposed system will be evaluated against a basic keyword matching search engine, as well as a more sophisticated one which utilizes traditional relevance feedback without the use of the taxonomy.

References:

George, G., Haas, M. R., & Pentland, A. (2014). Big data and management.Academy of Management Journal, 57(2), 321-326.

Grimes, S. (2008). Unstructured data and the 80 percent rule. Carabridge Bridgepoints.

High, R. (2012). The Era of Cognitive Systems: An Inside Look at IBM Watson and How it Works. Redguites for Business Leaders.