Reclaiming pre-digital domain knowledge: an application of background ontologies

From Master Projects
Jump to: navigation, search


About Reclaiming pre-digital domain knowledge: an application of background ontologies


Description

Laboratory reports are a key asset for both public and corporate research divisions. New tools for recording laboratory life are becoming available at a unprecedented pace. The knowledge gathered during years of research might however remain valuable far beyond the point at which those reporting protocols and recording tools become obsolete. This project focusses on a corporate thesaurus that collects the contributions of many years of biochemical research as a hardcopy print-out of concepts and the relationships between them. This thesaurus have been supporting document classification and search across a local repository within the proprietary organisation. While this domain knowledge is still of great value today, its mode of presentation on printed paper and the underlying knowledge model are both very much outdated. The figure here below shows an excerpt from the thesaurus in its original form:

Sample thes.png

Once adapted to state-of-the art technologies, the thesaurus terms could be used in authoring, annotation, query suggestion or autocompletion tools.

This master project aims at designing methods and software tools for semi-supervised convertion of this thesaurus into a seed ontology: a first step towards developing a food processing ontology. The project requires three separate steps. Existing domain knowledge will first be applied as background ontologies [1] for optimising OCR (Optical Character Recognition) performance. In a second step the thesaurus structure and its content will be used for extending or improving existing ontologies, including the very same seed ontology that supported character recognition in the first step. A third and final step will be the evaluation of the resulting ontology against domain expert knowledge through a user study.

Recommended Prior Knowledge

The successful applicants for this project will have some experience and/or interest in knowledge intensive processes and image analysis, preferably acquired through completing the courses Knowledge and Media (X_405065), Knowledge Engineering (X_405099), Multimedia Information Systems (5294MUIS6Y) and some experience of qualitative research methods. Previous experience with OCR software and low level signal processing is desirable, but not required.