Algorithms in Sequence Analysis - Assignment 1
Question 1
Global pairwise alignment
Gaps in one sequence. Modify the alignment equation for dynamic programming (below) to allow gaps only in sequence X.
M[i, j] = max { M[i-1, j-1] + score(X[i], Y[j]), M[i, j-1]-g, M[i-1, j]-g }
Question 2
The edit distance between two words is a number of operations needed to transform one word into another. The operations available are:
- replacement of single letter by another
- insertion of a single letter
- deletion of a single letter
Write down the new equation for M[i,j] in such a way that the score of the alignment will be the edit distance.
Question 3
Perform a global alignment of the protein sequences DARWIN and CRICK. First, complete the template matrix Q3 including the arrows. Use this scoring function:
M[i, j] = max { M[i-1, j-1] + blosum62(X[i], Y[j]), M[i, j-1]-2, M[i-1, j]-2 }
where blosum62(X[i], Y[j]) is the substitution score between residues X[i] and Y[j] according to the blosum62 exchange matrix:
Question 4
Provide the alignment after traceback from your matrix.
Question 5
What is the total alignment score?
Question 6
Local pairwise alignment Find two maximal scoring local alignments between sequences TGAGA and GAGGC using the following scoring function:
M[i, j] = max { M[i-1, j-1] +/- 1, M[i, j-1] - 2, M[i-1, j] - 2, 0 }
First fill out the template matrix Q5.
Note: You should use the Waterman-Eggert method.
Provide your score matrix.
Question 7
Provide the two alignments and their scores.
Note: We ask for local alignments, so be sure that you hand in local alignments.