Future State Exploration in Active Learning for Named Entity Recognition
|has title::Future State Exploration in Active Learning for Named Entity Recognition|
|Master:||project within::Computational Intelligence and Selforganisation|
|Student name:||student name::Thomas Bohlken|
|Second supervisor:||Cor Veenman|
|Second reader:||has second reader::Mark Hoogendoorn|
|Company:||has company::Nederlands Forensisch Instituut|
In the field of digital forensics, a common task is to retrieve information from digital texts, such as emails, text documents, chat conversations or even Twitter messages. One of the steps in this process is to identify named entities in a text, the most common of which are persons, organizations or locations. In the domain of forensics it is especially important for performance, and in particular recall, to be as high as possible for the task, since a false negative might result in missing a vital clue.
To achieve a boost in performance Active Learning can be used on instances of an evaluated dataset. The process of Active Learning will query potential entities found by the system to the user in order to learn from the feedback and thereby increase performance. In the field of Active Learning many approaches exist to select the entities that will boost performance of a model as much as possible, and thereby minimizing the amount of time and effort the user should invest. Although multiple question proposal techniques based on entity attributes exist, little research has been done on strategies that try to predict the performance difference as a result of a potential entity query.
In this project we will explore the possibilities of predicting future states in a question proposal strategy of Active Learning. By simulating and predicting the performance difference as a result of querying a potential entity to the user, each potential entity can be evaluated. Using this evaluation metric, the question proposal strategy can propose the next best instance for the system to query to the user. Using multiple future state evaluation methods, a comparison will be made with question proposal techniques that use entity attributes. Also, an exploration will be made to determine if the search process can be extended multiple layers deep.
The main research questions to evaluate the theoretical and practical side of the described domain will be:
- How does predicting future states influence performance?
- How does the exploration of search spaces of future states influence performance?
- To what extend is the implementation of predicting future states practical within the constraints posed by an Active Learning process?