Coner

MB 451 Microbial Diversity

Department of Microbiology - NC State University

Home | Announcements | Course Info | Lectures | Labs | Exams | Term Project | Grades | ~~~null pyro

Database Search ->

Term Project part 1 - Your Data


Microbiological data

For each of your isolates, put together a summary describing whatever you know about them microbiologically, and turn this in with your project:

  • Source - where did the original sample the organism was isolated from originate?
  • Media & growth conditions
  • Colony morphology
  • Cellular morphology

As always, the more details and information you can provide, the better. You will need all of this information at the end, to see if the phylotype of the organism(s) makes sense.


Data files

Sequencing data is listed by your PCR reaction numbers. All start with the PCR reaction number. The filenames also include the primer (_new515F) and file type suffix (.ab1, , .scf, .pdf, .seq, .clip, or .phd). All of our samples that contained a visible product of the right size were sent for sequencing, whether they seemed good enough to provide data or not.

If your the sequencing of sample was repeated to see if they could get better data, these have "#2" before the file type suffix. You should look at both the original and repeated sequencing runs to see if either is useable, and if so which is better.

Download your data files and save them with their .pdf or .clip suffix. Get the .pdf, and the .clip file for each of your reactions, whether they're good, bad, or weak.

NOTE: If you wish, you can also download the original .ab1 or .scf files that contains these tracings in raw form. These can be viewed and manipulated in any of several free programs: 4Peaks (Mac - this is what I use), Chromas (PC), BioEdit (PC - this is also a great alignment editor), FinchTV (Mac or PC), or TracerView (Mac, PC, or various Unix flavors).


Where do these sequences comes from?

The DNA you purified from your PCR reaction and some oligonucleotide primer (new515F - a shorter version of the forward primer used in the PCR reaction) were sent to Eurofins/MWG for sequencing. A few of days later they sent back the sequence data by email. The sequences were downloaded and posted below for you. The .pdf files were generated by "printing" pdf files of the .ab1 files from 4Peaks. New batches of PCR products are being sent out each week as they are generated by the students.


Examining
your data

You can view your sequencing data by opening the .pdf files you downloaded. Look carefully at your data. How does it look? Here is an example section from the beginning of a good sequence:

good sequence

At the top is the sequence as the machine interprets it, from left to right, numbered just beneath. This example is from the start of the sequence - notice the sequence numbering "10", then "20" below to printed sequence. Below both the interpreted sequence and numbering is the raw data from the sequencing machine.

Some sequences don't start off this cleanly - the sequence only becomes clear after a few bases.

The sequence reads directly from the printout. Hopefully the first 500 bases of sequence (after perhaps a dozen or so if it has a rough start) should be reliable. Somewhere between 500 and 800, the sequence quality will degrade to the point of unreliability.

If your sequence comes from more than one template, i.e. your culture wasn't pure or the PCR reaction was contaminated, you will have sequences in which some peaks look good (if both sequences have the same base at that position) and some are two peaks in the same place (where the two sequence differ):

mixed sequence data

If one of the sequences is much stronger than the other, this is no problem; the extra peak will be small compared to the main peak, and the machine can correctly read the stronger sequence. If they are close to the same strength, the machine will not correctly read either sequence. If the two sequences are from very closely-related organisms, these double peaks may be sporatic, and concentrated in the most variable regions of the rRNA. If they are distantly-related organisms, the double peaks will be more common, as as soon as the two sequences hve a difference in length (an insertion/deletion relative to each other), they will be out of sync and most of the peaks will be twined.

Print out a copy of your sequencing data (the .pdf file); you'll need this to turn in with your Term Project.

Now open the .clip file in a text editor (Notepad, Word, TextEdit, whatever), and print it out. This is the part of your sequence that the computer program in the sequencing machine has filtered and thinks is reliable. This is the sequence you'll actually use for your analysis. Go back to the printout of the .pdf of your data, and highlight the region of this sequence that is in the .clip file.

Be sure to open and look at (and print out) the data for all of your PCR reactions.


Decision time

 

 

Mr Bill
Oh, No!

 

If any of your sequences are good, that's great. You may even have multiple good sequences - if so, use them all. If you have a sequence from a mixed template, use it only if it looks pretty good and if you don't have a clean sequences you can use.

No usable sequence data?

bad gel

Some of you (only a few) will not get any PCR products from any of your reactions after purification. Others with PCR products will have failed to get any good sequence data. If none of your sequences yeilded useable data, and if you have a friend in the class that has more than one good sequence, then your best bet is to ask him or her if you can use one of their sequences - this way you get to do one and they do the rest of theirs. Otherwise, I'll poll the class & get someone to provide a sequence number and microbiological data for you to use. Please let me know either way as soon as possible.


Database Search ->
Last updated April 03, 2009 by James W Brown