A new measure for quality of MSAs

From Master Projects
Jump to: navigation, search


A new measure for quality of MSAs
status: finished
Master: project within::Bioinformatics
Student name: student name::Arjan van der Velde
Dates
Start start date:=2010/05/31
End end date:=2010/11/26
Supervision
Supervisor: Sanne Abeln
Second reader: has second reader::Jaap Heringa
Company: has company::VU, Bioinformatics
Thesis: has thesis::Media:Thesis.pdf
Poster: has poster::Media:Media:Posternaam.pdf

Signature supervisor



..................................

Abstract

A new measure for quality of MSAs

Background

Multiple sequence alignments (MSA) play a central role in many types of biological analyses. Because the alignment of sequences or structures usually happens at the start of the analysis it is critical that the quality of the alignment be accurate. In order to determine the quality of an alignment it is compared to a known good alignment and a score is calculated. Usually sum-of-pairs (SP) scores and column (CS) scores are used. Both the SP and CS scoring methods apply a rigid true or false criterium for the alignment of pairs and for entire columns. In many cases the alignment of residues might not be completely wrong but rather the best choice among several suboptimal solutions. This is especially true for structural alignments between sequences with low sequence identity in which there might not be a one-to-one relation between residues in the aligned sequences [1]. The goal of this project is to develop and test a new measure for quality of alignments that is more related to the biological quality of alignments than the SP and CS scoring methods.

Approach

As a first step a basic program for comparing MSAs using SP and CS scores will be built (or reused from VerAlign [2], depending on how much needs to be adapted). This program will be used as a basis for developing, implementing and testing the intended new measures. The first new measure to be implemented will be an adaptation of SP and CS scores that takes into account distances. Other new methods might take into account structural information such as physical distances. The new method(s) will be tested using existing data and reference MSAs like BAliBASE [3] and benchmarked using alternative structures [1] or by means of homology modeling (optional) [4].

Aims and objectives

The main goal of this project is the development and implementation of a new measure for quality of (structural) alignments, more robust to structural diversity than SP/CS scores and more related to the biological quality of the alignments than SP/CS. Besides the report required for the master program, if results are promising a paper will be written.

Planning

June: Literature study, implementing basic MSA comparison

July: Implementing, validating distance based measure

August: (mostly holidays)

September: Develop method using structural information

October: Implementing, validating structure based measure, writing report

November: Writing report


References

[1] Pirovano W, Feenstra KA and Heringa J: The meaning of alignment: lessons from structural diversity. BMC Bioinformatics 2008, 9:556

[2] VerAlign: http://www.ibi.vu.nl/programs/veralignwww/

[3] Thompson,J.D., Koehl,P., Ripp,R. and Poch,O. (2005) BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins, 61, 127–136.

[4] Forrest LR, Tang CL, Honig B (2006) On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. Biophys J 91:508-517