JANSEN, ROBERT K.1*, LI-SAN WANG2, TANDY WARNOW2, LINDA A. RAUBESON3, and BERNARD M. E. MORET4. 1Section of Integrative Biology, University of Texas, Austin TX 78712; 2Department of Computer Science, University of Texas, Austin TX 78712; 3Department of Biological Sciences, Central Washington University, Ellensburg WA 98926; 4Department of Computer Science, University of New Mexico, Albuquerque NM 87131. - Comparisons of parsimony methods for phylogeny reconstruction from gene order data.
The rapid explosion in the availability of genomic data has posed
several computational challenges for using gene order data for
phylogeny reconstruction. In an earlier study we showed that one
parsimony method (MPBE - Maximum Parsimony on Binary Encodings)
performed better in simulations than distance-based methods,
especially in situations where evolutionary rates were high. We have
developed and tested new parsimony methods for using gene order date
in phylogenetics. All of these methods involve encoding gene order
data in to binary or multistate characters and constructing tree
topologies using standard parsimony methods. In the first method
(MPBE) each gene order is converted into a binary sequence, which
corresponds to pairs of adjacent genes. The second method, MPBE-2, is
a modification of MPBE in which sites that are present in every
sequence and sites that represent ancestral conditions are removed.
The third method (MPME - Maximum Parsimony on Multistate Encodings)
encodes gene order data as multistate characters. We have used both
simulated and real data to examine to performance of each of these
parsimony methods. Our simulations explored a wide range of settings,
including different numbers of gene and genomes, a range evolutionary
rates, and various probabilities of inversions, inverted
transpositions, and transpositions. The results of the simulations
indicate that MPME performs best but it is the slowest method. This
result may reflect the fact that MPME encodes more information than
the other two parsimony methods. We have also analyzed gene order data
from the chloroplast genomes in angiosperm family Campanulaceae using
all 3 parsimony methods. Although all trees generated from these
analyses are congruent with each other and with trees produced by
chloroplast gene sequences, they do differ in the amount of resolution
and support.
Key words: Campanulaceae, chloroplast genomes, comparative genomics, computational biology, phylogenetic theory