Transcription factors are proteins that can bind to the upstream region of a gene and regulate gene expression by acting as activators or repressors of transcription. Most transcription factors have been characterized in eukaryotes. Plant transcription factors play important roles in flower development, pigment biosynthesis, circadian clock, hormone or stress responses, and evolution. We surveyed all major families of plant transcription factors in the InterPro database, which was built upon Swiss-prot and TrEMBL data using domain searches. We also performed similar searches in the GenBank Conserved Domain database. The Myc, Myb/bHLH, MADS-box and Homeobox families were distributed in yeast, plants and animals, and each family included 200~700 plant proteins. Among plant-specific families, the AUX/IAA and K-box families were represented with sequences from mosses to eudicots, and Znf-CO, AP2/EREBP families from conifers to eudicots. Members of Znf-Dof, WRKY and bZIP/GBF families were found in few monocots and eudicots, probably due to the predominance of genomic sequences generated from model organisms such as rice and Arabidopsis. We conducted another analysis comparing samples of Expressed Sequence Tags (ESTs) from a non-normalized Arabidopsis flower cDNA library to the ESTs from the TIGR Tomato Gene Index. About 15% of Arabidopsis ESTs hit transcription and signal transduction factors, a number consistent with the proportion of these genes estimated from the full genome sequence of Arabidopsis. Homologs of these factors were found in significant BLAST hits to the tomato EST sets. The results implied that homologs of many transcription factors could be identified from ESTs using comparative genomic approaches. This requires an integrated analysis of protein, DNA, cDNA, and EST datasets. Rapid development of EST datasets from model and nonmodel plants should be an effective first step toward identification of homologs of important transcription factors in diverse plant species.

Key words: comparative genomics, EST, protein database, transcription factor