|
||
| Home | Announcements | Course Info | Lectures | Labs | Exams | Term Project | Grades | |
||
Problem set - Tree building 1. Construct a similarity matrix from this alignment: Seq A G A U C U U U G G A U C
Seq B A A U C U C U G G A U U
Seq C C A U C U U U - G A U G
Seq D A A U C U U U G G A U U
Seq E C A U C U C U G A A U G
2. Given the Jukes & Cantor conversion graph below, convert the similarity matrix from question 1 to a distance matrix.
3. Given this distance matrix, draw a tree relating these sequences and label the lengths of the branches of the tree: Seq A Seq B Seq C Seq D Seq A - - - - Seq B 0.2 - - - Seq C 0.6 0.6 - - Seq D 0.9 0.9 0.9 - 4. Given this distance matrix, draw the structure of a tree relating the sequences using the neighbor-joining method. Can you fill in branch lengths that would satisfy the distances? Seq A Seq B Seq C Seq D Seq E Seq F Seq A - - - - - - Seq B 0.2 - - - - - Seq C 0.7 0.7 - - - - Seq D 0.6 0.6 0.5 - - - Seq E 0.7 0.7 0.8 0.7 - - Seq F 0.6 0.6 0.7 0.6 0.3 - 5. Given this distance matrix, use the Fitch method to draw a tree relating these sequences and label the lengths of the branches of the tree: Seq A Seq B Seq C Seq D Seq A - - - - Seq B 0.2 - - - Seq C 0.6 0.6 - - Seq D 0.9 0.9 0.9 - 6. Generate a tree using the neighbor-joining method, with approximate branch lengths, from these P1 RNA sequences. Use the Jukes & Cantor correction graph in question 2 to help with the distance matrix. For fun, do it again using the Fitch method. Thiobacillus ferrooxidans GAAUUCCCGGGAGGGGCCAGGCGACCCCCGAAUUCCCGG Escherichia coli GAAUUCCCGGAAGCAGACCAGACAGUCGCCGAAUUCCCGG Serratia marcescens GAAUUCCCGGAAGUAGACCAGACAGUCACCGAAUUCCCGG Chromatium vinosum GAAUUCCCGGGAGGGGCCAGACAGUCCCUGAAUUCCCG Answers 1. Seq A Seq B Seq C Seq D Seq E
Seq A XXX XXX XXX XXX XXX
Seq B 0.75 XXX XXX XXX XXX
Seq C 0.75 0.67 XXX XXX XXX
Seq D 0.83 0.92 0.75 XXX XXX
Seq E 0.67 0.75 0.75 0.67 XXX
2. Seq A Seq B Seq C Seq D Seq E
Seq A XXX XXX XXX XXX XXX
Seq B 0.27 XXX XXX XXX XXX
Seq C 0.27 0.50 XXX XXX XXX
Seq D 0.21 0.08 0.27 XXX XXX
Seq E 0.50 0.27 0.27 0.50 XXX
3. This is an easy one - only one joining to do (A and B, the shortest distance in the matrix), so no math required for the tree structure, and the branch lengths are all perfect & even:
4. This one is more work, but just a matter of doing the averages after the first joining. Step 1, first neighbor-joining: A & B (the shortest distance in the matrix)
and combine A and B in the matrix (remember to average): SeqAB Seq C Seq D Seq E Seq F SeqAB - - - - - Seq C 0.70 - - - - Seq D 0.60 0.5 - - - Seq E 0.70 0.8 0.7 - - Seq F 0.60 0.7 0.6 0.3 - Step 2, second neighbor-joining, E with F (the shortest distance in the condensed matrix), and combine (average) them in the distance matrix:
SeqAB Seq C Seq D Seq EF SeqAB - - - - Seq C 0.70 - - - Seq D 0.60 0.5 - - SeqEF 0.65 0.75 0.65 - Step 3, the last neighbor-joining, C and D (the shortest distance in the tree). No need to condense the matrix - we won't need it, all of the nodes will be resolved:
Step 4 - Fitting the distances (they match perfectly, with even numbers, so no math required):
5.
6. Here is the alignment of these sequences: Thiobacillus ferrooxidans GAAUUCCCGGGAG-GGGCCAGGCGACCCCCGAAUUCCCGG Escherichia coli GAAUUCCCGGAAGCAGACCAGACAGUCGCCGAAUUCCCGG Serratia marcescens GAAUUCCCGGAAGUAGACCAGACAGUCACCGAAUUCCCGG Chromatium vinosum GAAUUCCCGGGAG-GGGCCAGACAGUCCCUGAAUUCCCGG From which the following similarity matrix can be calculated: T.ferr E.coli S.mars C.vin T.ferrooxidans -- -- -- -- E.coli 0.775 -- -- -- S.marcescens 0.775 0.950 -- -- C.vinosum 0.875 0.850 0.850 -- The Jukes & Cantor conversion results in the following similarity matrix (notice that at these high levels of similarity, the estimated distances ploted by eye are very close or identical to the 'dissimilarities' of the sequences): T.ferr E.coli S.mars C.vin T.ferrooxidans -- -- -- -- E.coli 0.225 -- -- -- S.marcescens 0.225 0.050 -- -- C.vinosum 0.125 0.150 0.150 -- This one is trivial to sort out the structure of the tree using neighbor-joining - with only 4 sequences, there's only one joining to do, so no need to condense the matrix even once. The smallest distance is between E. coli and S. marsescens, so they are joined on a branch. Then sift out the distances to fit:
Starting with the same distance matrix, the tree can be built like this using the Fitch method (I usually start with the most similar sequences & add in the sequences in order of increasing distance):
|
||
| Last updated April 05, 2009 by James W Brown |