DAVIS, JERROLD I.*, KEVIN C. NIXON, and DAMON P. LITTLE. L.H. Bailey Hortorium and Dept. of Plant Biology, Cornell University, Ithaca, NY 14853. - A conventional analysis of the 500-terminal rbcL data set.
The 500-terminal (499-taxon) rbcL data set assembled and analyzed by
Chase et al. in 1993, and re-analyzed by Rice et al. in 1997, is a
useful benchmark for the comparative evaluation of available methods
and software for phylogenetic analysis. Chase et al. detected trees of
length 16225 (with uninformative characters deleted), and Rice et al.
detected trees of length 16220. Trees two steps shorter (16218 steps)
have been discovered numerous times using the parsimony ratchet, and
these are likely to be the most-parsimonious trees for this matrix. We
have analyzed this data set by conventional means (i.e., non-ratchet
searches) to determine whether such searches can discover trees of
length 16218 in reasonably short time periods, and if so, which
settings are most efficient for the analysis of this data set. The
results are also applicable to searches that use the ratchet and other
methods, for they identify the factors that have substantial effects
on the efficiency of tree searches whenever branch swapping is
conducted. Trees of length 16218 were discovered by more than 150
conventional searches, and trees of length 16220 or shorter were
discovered by more than 1500 conventional searches, representing a
variety of strategies and settings. The most efficient searches
involve multiple stages, with relatively few trees held during each
stage.
Key words: angiosperms, cladistic analysis, large data sets, parsimony, rbcL