Structuring geographical metadata for Instituut voor Beeld en Geluid

From Master Projects
Jump to: navigation, search


has title::Structuring geographical metadata for the Dutch Institute of Sound and Image
status: finished
Master: project within::Information Sciences
Student name: student name::Dorine Berkvens
Dates
Start start date:=2009/04/01
End end date:=2010/01/06
Supervision
Supervisor: Veronique Malaise
Second reader: has second reader::Lora Aroyo
Company: has company::Instituut voor Beeld en Geluid
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Media:Posternaam.pdf

Signature supervisor



..................................

Abstract

Context

The research is a part of the CHOICE research project of the VU Amsterdam. This project is mainly about the realisation of a semi-automatic semantic annotation application for archived TV and radio programs. The research partner is the Nederlands Instituut voor Beeld en Geluid, which amongst others archives the Dutch public TV broadcasts. In the archiving process, the cataloguers from Beeld en Geluid annotate the TV programs with terms selected from a thesaurus: the GTAA. This faceted thesaurus contains 6 facets, or axis: 2 are used to describe meta-information about the program (the genre and maker facets) and 4 are used for describing the TV program’s content (with the Subject, Person, Geographical Location and Name facets ). The annotation is made in a retrieval perspective, to enhance the search in the archives. The search process involves other users besides annotators; laymen and children. The institute has a special focus on the last group with respect to education. Three of the 4 facets of the thesaurus meant for describing the TV program’s content are not structured: the Location, Person and Name facets are flat alphabetical lists. Giving more structure to them should facilitate the access to the relevant term more quickly to cataloguers at indexing time and to the users of the search facility, at query time.


Problem

The facets of the domain thesaurus already exist, but need to be structured in a way that is useful for annotators to make the annotation process easier.


Research Questions

Arranged in priority: • How is it technically possible to structure the geographical axis by using existing external resources? • Within the geographical axis, what structures, hierarchies and relationships are relevant for the annotators? • What is the best way to present the structured hierarchy to an annotator so he can easily find relations to other categories that are needed?


Approach

1st research question; start with exploring and researching the options and technical possibilities of different sources. Evaluate them and choose one for actual mapping and structuring. Define algorithms for the mapping and build a program in order to get an improved thesaurus structure.

The focus is on the first research question, but if possible, the other 2 can be taken into account, which could possibly lead to a better solution for the first question. 2nd research question; Research the requirements of the annotators for a structure. This can be done by for instance interviews with or observation of tasks of annotators. Define several scenarios for an annotator. Come up with different ways for structuring and validate these with annotators.

3rd question: After a first structure is made, search for different ways to present it to annotators. Look at existing solutions and possible new ones. Make a design and validate with annotators. Optimal would be to present a prototype.


Planning

Startup phase (4 weeks): get to know the organisation, learn the programs to be used, and read relevant documents. Make an action plan for the coming months. Performance phase (rest): Build the use case and make the deliverables. Completion phase (4 weeks): Work on the project report and if necessary refine deliverables.

Result

At least: - A structured thesaurus will be presented that matches with the research results. - A written project report of the research project

Optional: - An extension of the thesaurus that makes it useable for child users. - An interface design to present this structure to an annotator.