The rapid explosion in the availability of genomic data has posed several computational challenges for using gene order data for phylogeny reconstruction. In an earlier study we showed that one parsimony method (MPBE - Maximum Parsimony on Binary Encodings) performed better in simulations than distance-based methods, especially in situations where evolutionary rates were high. We have developed and tested new parsimony methods for using gene order date in phylogenetics. All of these methods involve encoding gene order data in to binary or multistate characters and constructing tree topologies using standard parsimony methods. In the first method (MPBE) each gene order is converted into a binary sequence, which corresponds to pairs of adjacent genes. The second method, MPBE-2, is a modification of MPBE in which sites that are present in every sequence and sites that represent ancestral conditions are removed. The third method (MPME - Maximum Parsimony on Multistate Encodings) encodes gene order data as multistate characters. We have used both simulated and real data to examine to performance of each of these parsimony methods. Our simulations explored a wide range of settings, including different numbers of gene and genomes, a range evolutionary rates, and various probabilities of inversions, inverted transpositions, and transpositions. The results of the simulations indicate that MPME performs best but it is the slowest method. This result may reflect the fact that MPME encodes more information than the other two parsimony methods. We have also analyzed gene order data from the chloroplast genomes in angiosperm family Campanulaceae using all 3 parsimony methods. Although all trees generated from these analyses are congruent with each other and with trees produced by chloroplast gene sequences, they do differ in the amount of resolution and support.

Key words: Campanulaceae, chloroplast genomes, comparative genomics, computational biology, phylogenetic theory