Lecture 3 Comparative analysis of RNA secondary structure The undisputed "Gold Standard" in secondary structure determination (NOT prediction!) Principle comparison to genetic analysis of mutation/secondary reversion Early examples DNA - Chargaf's rules & Watson/Crick tRNA - cloverleaf only common structure possible for first three tRNA seqs Analysis of secondary structure - an interative process essentially a manual process refine alignment and add sequences to alignment alignments definition - 2D matrix - homology seq gaps basepair designators group by phylogeny start with only seqs that can be aligned on the basis of sequence similarity simple identification of covariations use these to add more disparate sequences (or sequence regions) large gaps as structural subdomains variable regions w/ gaps - often hairpins with variable lengths analysis of subsets example - early P seqs in gamma-proteos & Bacillus genetic analysis for "confirmation" Haas Science paper example Statistical analysis of covariations - Mxy How to quantitate sequence variation? e.g. 50% conservation - but what about the rest? information theory - Hx definition sequence logos P RNA example How to quantitate sequence covariation? Mxy definition/algorithm 3D/2D plots of secondary structure 1D plots of adjacent bps basic example - tRNA Natural pops approach secondary structure details - archaeal P RNA non-Watson-Crick pairings - archaeal P RNA interpreting data phylogenetically Automated methods require most of the info sought!, i.e.: starting alignment, or seqs than can be aligned based solely on similarity, or a starting structure the alignment contains the structure as implicit information! Data requirements & resolution more sequences, better resolution need at least ~30 sequences - 50 better, thousands best Strengths objective, quantitative automatable & visualizable basepair resolution can distinguish thermodynamically equivalent possibilities only biologically-relevant structures identified Weaknesses phylogenetic affects - but can be dealt with seq sample affects - i.e. P w/ half purples alignment basically a manual process Mxy best for final stages of secondary analysis & analysis of tertiaries identifies base-base interactions no specific information from invariant sequences no specific information from idiosyncratic sequences - use MFOLD! difficult to incorporate biochemical data