Algorithms in Sequence Analysis - Assignment 1

From Master Projects
Revision as of 13:25, 2 November 2011 by Feenstra (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Question 1

Global pairwise alignment

Gaps in one sequence. Modify the alignment equation for dynamic programming (below) to allow gaps only in sequence X.

M[i, j] = max { M[i-1, j-1] + score(X[i], Y[j]),
                M[i, j-1]-g,
                M[i-1, j]-g }

Question 2

The edit distance between two words is a number of operations needed to transform one word into another. The operations available are:

  • replacement of single letter by another
  • insertion of a single letter
  • deletion of a single letter

Write down the new equation for M[i,j] in such a way that the score of the alignment will be the edit distance.

Question 3

Perform a global alignment of the protein sequences DARWIN and CRICK. First, complete the template matrix Q3 including the arrows. Use this scoring function:

M[i, j] = max { M[i-1, j-1] + blosum62(X[i], Y[j]),
                M[i, j-1]-2,
                M[i-1, j]-2 }

where blosum62(X[i], Y[j]) is the substitution score between residues X[i] and Y[j] according to the blosum62 exchange matrix: Blosum62.jpg

Question 4

Provide the alignment after traceback from your matrix.

Question 5

What is the total alignment score?

Question 6

Local pairwise alignment Find two maximal scoring local alignments between sequences TGAGA and GAGGC using the following scoring function:

M[i, j] = max { M[i-1, j-1] +/- 1,
                M[i, j-1] - 2,
                M[i-1, j] - 2,
                0 }

First fill out the template matrix Q5.

Note: You should use the Waterman-Eggert method.

Provide your score matrix.

Question 7

Provide the two alignments and their scores.

Note: We ask for local alignments, so be sure that you hand in local alignments.