Chapter 3 Data and analysis

3.1 Genomic data

By definition genomes are large. The human genome is 3.2 billion base pairs. Plants can be even larger. If you were to go through and compare two plant genomes by hand to discover the differences between them it would take a while. To add to this complexity, genomes don’t come off the sequencer as a 3.2 billion base pair sequence, but as millions of small fragments or thousands of large ones (depending on the technology). Think of this sequencing method as tossing a book (your genome) into a paper shredder. That means first you have to compare each fragment to a “reference genome” (like the real book) and then figure out where your genome differs from the reference. This is definitely a process you don’t want to do by hand!

A computer (at least a big one with lots of computing power) can help us out by automating all these comparisons. That means that before we can identify how the sequence of two plants differ we need to build some skills to communicate with the server that we are using to work with our data.

3.2 Phylogenies

Once we are able to work with out data using computational tools we can begin to focus on the data that we have and what our results can tell us. An important finding in phylogenetics is that even lots of genomic data does not provide a definitive answer for evolutionary relationships. Often different analyses produce different phylogenies due to different methods and different assumptions about the data. Different datasets can also result in different trees. In particular, rapid diversification of species, such as we observe in the Andes, makes analyses challenging.

Once we have established a phylogeny that we feel comfortable with, we might like to identify when species diverged. Divergence dates can be estimated on trees with calibrations. This can be challenging in cases where few fossils are available, and especially when the exact date for that fossil is unknown.

3.3 Comparative analyses

Finally, our larger goal in evolutionary biology is to understand not only when species diverged but how traits evolve and how species evolve relative to a landscape and environment. We will strive to understand models of character evolution and apply simple phylogenetic comparative models to existing datasets.