|
||
| Home | Announcements | Course Info | Lectures | Labs | Exams | Term Project | Grades | |
||
Readings for this lecture:
Genomics Evidence for horizontal gene transfer and it's affect on phylogeny Purpose: The purpose of the paper is to report the complete genome sequence from Thermotoga. The reason for focusing on this organism is to try to understand bacterial ancestry and early evolution.
In this paper, the authors have sequenced the entire genome of Thermotoga maritima. This was an early genome sequence, reported only 4 years after the very first complete genome sequence was published (Haemophilus influenzae Rd, in 1995), so a lot of the paper is devoted to basic info about the genome; size, number of genes, &c, &c, compiled in Table 1 and Figure 1. The genome is a signle circle of 1.86Mbp, about average for a bacterial genome, with 1877 identified ORFs, a full complement of tRNAs and rRNAs (a single operon, but not the usual 16-23-5 arrangement). About half of the ORFs are similar enough to characterized genes from other organisms to be identifiable at some level of confidence. They use these (along with the known properties of the species) to infer the metabolic pathways it uses.
Notice that there isn't an electron transport chain, and the "ATP synthase" is shown hydrolyzing ATP to pump protons. This is how organisms that don't have electron trasport make a proton gradient to drive their active transport systems (which Thermotoga obviously has a lot of). The most interesting thing about the genome sequence, though, was a big surprize; there is good evidence for significant amounts of "foreign" genes, from Archaea, in the Thermotoga genome. The authors argue that up to about a fourth of the genome was acquired from other sources (Archaea) recently enough that it's foreign-ness can still be detected; others would reduce this down to about 5%. Regardless, it is clear that lots of the DNA in this organism doesn't share the rRNA "vertical" ancestry; the organism is, at least to some extent, an evolutionary mosaic, and the implication is that maybe all Bacteria are as well. So, how do the authors decide that certain regions/genes originated somewhere else?
The authors also argue pretty strongly against the use of any single gene to create phylogenetic trees; the obvious "target" for this comment is the common use of the ssu-rRNA. They go on to talk about a bunch of trees of other genes that give some other phylogentic tree than the ssu-rRNA. They are clearly talking from the perspective of modern microbial "species" as amalgams of genes from a common microbial gene pool. The notion here is that microbes (or at least prokaryotes) don't have any meaningful phylogeny, that each gene has it's own history and the organism itself has no unique evolutionary history. This represents the opposite extreme of the now-classical notion that the ssu-rRNA phylogenetic tree is the only meaning representation of the organisms evolutionary history. Horizontal transfer Horizontal transfer can be divided into types types: intra-specific (between close relatives) and inter-specific (between distant relatives). Intraspecific recombination In sexually-reproducing plants and animals, species are defined as breeding populations - groups of individuals that are capable of producing viable, fertile offspring. In Bacteria, Archaea, and many eukaryotes, reproduction and genetic exchange (sex) are not directly linked. However, Bacteria (and presumably Archaea) do exchange DNA amongst members of a population of related individuals by intraspecific recombination. DNA is transferred by transducing phage, conjugative plasmids, and direct uptake from the environment (presumably from lysed cells), and this DNA is incorporated into the chromosome by homologous recombination. This is generally specific to closly-related organisms because homologous recombination demands that the sequences be very similar. Many organisms naturally take up DNA from the environment, either non-specifically or by specific recognition of DNA sequence tags that indicate that the DNA is from the same specie. The advantages of intraspecific recombination is the same as for sexual reproduction - alleles are shuffled, allowing those that are favorable under the circumstances to combine with other genetic backgrunds or favorable alleles without having to re-invent them from scratch, or to lose their connection with dis-advantageous alleles in their 'home' genome. Distant-species gene transfers It is also clear that DNA sequences have moved large phylogenetic distances, such as the genes in Thermotoga that seem to come from Archaea. But there are plenty of other examples:
A special case of lateral gene transfer is cell fusion, or hybridization. The most extreme case of lateral gene transfer is when the cells of different kinds of organisms fuse, including their genomes, creating a new kind of organism. This is a well-known process in the plant world, but their are probably cases of it in microbes as well. What impact does long-distance horizontal transfer have for "The Big Tree"? The answer to this is not yet clear. Horizontal transfer is a contentious issue, and a lot of this is historical baggage. For a long time, horizontal transfer was a ready excuse used, without evidence, for any unexpected gene in an organism. As a result, the notion of horizontal transfer acquired a bad name; it was seen as thoughtless handwaving. When ssu-rRNA trees began to revolutionize how we thought about microbial evolution, the focus was on understanding general phylogenetic relationships, and horizontal transfer was dismissed as a very rare event, not significant in the evolution of these organisms. But as genome sequences began to become available, it became clear that horizontal transfer is a general and significant aspect of microbial evolution, and some scientists went so far as to declare that phylogenetic relationships are meaningless for prokaryotes; that every gene in an organism has its own evolutionary history and that microbes as a whole are just temporary subsets of a big prokaryotic gene pool. This remains a topic of sometimes-harsh argument. But it seems that although horizontal transfer is a big factor in the evolution of microbes, there is a "core" of the genome that generally reflects the vertical evolution of the organism. This core is predominantly the genes for the "central dogma" of the cell (DNA->RNA->protein); in other words, the information-processing system. In contrast, genes for metabolism (and therefore phenotype) seem to be far more susceptible to horizontal transfer. Since these are the things we see and measure about an organism, these are certainly important. But if the properties of an organism can move horizontally, does that mean that phylogeny is meaningless? No. When a gene is acquired from some other source by an organism, it begins the process of adapting to it's new genetic environment; in other words, it slowly becomes part of the organism it now resides in. Horizontal transfer (at least in my opinion) is just another source of the genetic variation that Darwinian evolution requires.
The current task, then, is to not throw the baby out with the bathwater, and come up with a theory of microbial species and evolutionary history that incorporates both vertical and horizontal inheritance. This is probably the single most important problem in the science of Microbiology today. Remember that "The Origin of Species", and everything that has flowed from it since, originated from trying to understand what a "species" really is in the macroscopic world. What will we learn from a much-needed understanding of microbial species? Comparative Genomics Analysis of the complete genome sequence of Parachlamydiae strain UWE25, a symbiont of the free-living amoeba Acanthamoeba.
Also see the most excellent UWE25 Genome web site. Purpose: To compare the genetic complement of UWE25 to that of it's more specialized animal pathogenic relatives to learn about how the ancestors of these animal pathogens were preadapted to the pathogenic lifestyle. This is a genome sequence paper, although you might get all the way through it and still not be sure if you don't go to the online supplementary data, where the actual sequencing and annotation is described. That's the way it is these days; it used to be that you'd triumphantly herald the "COMPLETE GENOME SEQUENCE OF THISORTHAT!", now you need to actually have something interesting (anything interesting, no matter how desperately you have to look) to focus on other than just the sequence. This shouldn't be so hard - after all, you had some reason (or at least an excuse) in the grant proposal that funded the sequencing - but sometimes you have to wonder. Not so in this case. This is a great example of comparative genomics, where you learn about an organism, or group of organisms (in this case the pathogenic Chlamydiae) by comparing its genome with those of related organisms with different properties. In this case, what we'd like to know about Chlamydiae is how they evolved from free-living Bacteria to obligately parasitic virus-like pathogens. The genome sequences of 4 human pathogenic Chlamydiae had already been determined when this paper was published, and at, but this is the first available "environmental" Chlamydiae sequence. (Note: There are now 12 complete genome sequences from Chlamydiae, and who knws how many more in the pipeline.) The idea here is to compare this relatively primitive symbiont of amoeba to the more specialized animal parasites in attempt to see how the animal parasites have adapted to the lifestyle from their less-specialized ancestors, and to see how the ancestors of the parasites were "preadapted" by their relationship with unicellular eukaryotes. It other words, the primitive relative is acting as a surrogate for the unavailable primitive ancestor. The authors understand the dangers of this - thus the focus on horizontal transfer and recent changes - but it bears keeping in mind that UWE25 is not an ancestor of pathogenic Chlamydiae, but rather a primitive cousin.
The genome of UWE25 is clearly less "reduced" than that of the pathogenic Chlamydiae. It is about 2.4MBP in length, with over 2000 protein-encoding genes - twice the size as those of the parasitic Chlamydiae, and as large as those of many free-living organisms. UWE25 also relies on it's host for many of it's amino acids, but retains the ability to make some (glycine, serine, glutamine, proline). Interestingly, it lacks the genes for tryptophan synthesis - trp synthesis is a virulence factor in the pathogens, somehow affecting sensitivity to interferon, and this is the only of the "standard" amino acids they can produce. Because it can make these few amino acids, UWE25 retains a complete TCA cycle, from which most amino acid synthetic pathways originate. UWE25 retains much more sugar metabolism - the pathogens get all of their phospho-sugars and pentoses from their host. UWE25 can apparently also make a proton gradient made using an abbreviated proteon-pumping electron transport chain. The authors interprete this to mean they can use this gradient to make ATP, but my guess is that they use this gradient to run their active transport uptake systems, from which they make their living off of the host. These pumps would also be needed by the pathogens, which would presumably energize them with a proton gradient generated by ATPase run in reverse (hydrolyzing ATP to pump protons out instead of harvesting the gradient to make ATP). The pathogens have the genes to make small amounts of ATP from a very short electron transport chain that does not make a proton gradient - it just serves to dump electrons onto the terminal electron acceptor. Both UWE25 and the pathogens have the genes for ATP generation by substrate-level phosphorylation. However, UWE25, like the pathogens, also have the ATP/ADP antiport, and so it presumably gets the bulk of it's ATP by "energy parasitism". Interestingly, UWE25, like it's pathogenic relatives, has a type III secretion system. They use a phylogenetic tree to argue that these genes in UWE25 have been there for a while (not horizontally acquired), suggesting that it would also have existed as a preadaption to the pathogenic lifestyle in the common ancestor of the Chlamydiae.
These genes are strict virulence factors in pathogens; what UWE25 uses it for is not known, but presumably it involves "secreting" proteins into it's host for some purpose. The best candidate is CPAF (the gene for which is in UWE25), which in the pathogens is involved in degrading transcription factors required for MHC ("self") antigen synthesis. Amoebas don't have MHC's, so exactly what CPAF is doing is unclear. UWE25 also contains another complex, a type IV secretion system, presumably for pumping some other protein into it's host. These proteins are not found in the pathogens, and the odd G+C content of these genes suggests that they are recent transplants from some other organism. Metagenomics Venter JC, et al., 2004 Environmental genome shotgun sequencing of the Sargasso sea. Science 304:66-74. Purpose: To obtain a broad sampling of the genetic potential of a complex microbial environment. And patent all of the sequences and somehow later make money for the investors based on this ownership. If a "genome" is the complete genetic composition of an organism, then a "metagenome" is the complete genetic composition of an ecosystem. Genomes these days are sequenced shotgun-style, i.e. by sequencing large random collections of clones of DNA from an organism, and then assembling the snippets of sequences together to get the genome sequence. Not so long ago, the task of sequencing an entire microbial genome was considered monumental, but recently the entire Streptococcus faecalis genome was sequenced (i.e. the actual sequencing reactions) in a single day (yes, it was a stunt). The first bacterial genome sequences were done by The Institute for Genome Research (TIGR), a company lead at the time by the rather excentric visionary Craig Venter. The human genome sequence is about 1000 times larger than a typical bacterial genome sequence; again, a monumental task, and again the driving force behind getting it done (such as it is) was Craig Venter. So, what do you do next if you're Craig Ventor? Sequence a metagenome! The metagenome chosen was near-surface water collected from the Sargasso Sea, because so much microbial ecology had already been done on this environment and because this gave Venter a chance for a tax write-off for his yacht.
The paper (and supplement) start with some summary tables, intended to blow your mind away with the sheer scale of these sequences.
Then they spend some effort with the bits (a small fraction) of the sequences they can assemble into coherent parts. The first is a collection of contigs that are chucks of a population of Proschlorococcus:
These sequences araen't all the same - there is the expected heterogeneity - alleles of genes - and some rearrangements of gene order.
They were also able to pull out essential complete Shewanella genome sequences:
Lastly they pull out a nearly complete genome of Burkholderia. Which is odd in at least two ways. First, Burkholderia is a pathogen, or at least a symbiont of at least mammals. It has never been detected in seawater, much less any of the Sargasso samples people have looked at over many, many years. And yet there seems to be a lot of it - enough to assemble the thousands upon thousands of individual sequences into a genome sequence. Secondly, and also unlike the Prochlorococcus or Shewanella sequences, there isn't any significant heterogeneity in the sequences. They are clonal within experimental error. And so a group of folks who'd worked on the Sargasso microbiology for a while was sceptical, and sifting through the data discovered an important aspect of the data that the authors of this paper overlooked: all of the Burkholderia sequences - every single one - came from the DNA extracted from a single filter. Furthermore, the seqeunce turns out to be B. cepacia group K, a strain only associated with human infections from contaminated medical implants and not able to grow in sea water. And, the authors later confirmed that although all of the other filters were new when used to collect the sargasso samples, this one had been recycled and was of unknown origin. So . . . the Burkholderia sequences are certainly contaminants. In addition to these near-complete genome sequences, they got some complete plasmid sequences. Some of these plasmids are large enough and contain housekeping genes and so ought probably to be considered small chromosomes.
They were particularly interested in rhodopsins, and got a lot of rhodopsin sequences. Theyt treed these out to get an idea of what the phylogeny of these might be, but it is also clear that these were from more than alpha proteobacteria. How do they know? By looking at the flanking genes and treeing these out ingroups. Theyt found the flanking geres ere from a wide range of organisms, all kinds of proteobacteria, bacteroids, &c, &c. So this gene is probably being moved around by lateral gene transfer a lot.
Lastly, they try to sort out what kinds of organisms might be in the samples based on the rRNA genes they got, and a lot of other conserved sequences. Although there are some differences, they are grossly consistent, and similar to what others have seen in other surveys of this environment.
Questions for thought:
|
||
| Last updated April 10, 2009 by James W Brown |