CUI, LIYING* and CLAUDE W. DEPAMPHILIS. 208 Mueller Lab, Penn State University, University Park PA 16802. - Distribution of transcription factors in protein and EST databases.
Transcription factors are proteins that can bind to the upstream
region of a gene and regulate gene expression by acting as activators
or repressors of transcription. Most transcription factors have been
characterized in eukaryotes. Plant transcription factors play
important roles in flower development, pigment biosynthesis, circadian
clock, hormone or stress responses, and evolution. We surveyed all
major families of plant transcription factors in the InterPro
database, which was built upon Swiss-prot and TrEMBL data using domain
searches. We also performed similar searches in the GenBank Conserved
Domain database. The Myc, Myb/bHLH, MADS-box and Homeobox families
were distributed in yeast, plants and animals, and each family
included 200~700 plant proteins. Among plant-specific families, the
AUX/IAA and K-box families were represented with sequences from mosses
to eudicots, and Znf-CO, AP2/EREBP families from conifers to eudicots.
Members of Znf-Dof, WRKY and bZIP/GBF families were found in few
monocots and eudicots, probably due to the predominance of genomic
sequences generated from model organisms such as rice and
Arabidopsis. We conducted another analysis comparing samples of
Expressed Sequence Tags (ESTs) from a non-normalized Arabidopsis
flower cDNA library to the ESTs from the TIGR Tomato Gene Index.
About 15% of Arabidopsis ESTs hit transcription and signal
transduction factors, a number consistent with the proportion of these
genes estimated from the full genome sequence of Arabidopsis.
Homologs of these factors were found in significant BLAST hits to the
tomato EST sets. The results implied that homologs of many
transcription factors could be identified from ESTs using comparative
genomic approaches. This requires an integrated analysis of protein,
DNA, cDNA, and EST datasets. Rapid development of EST datasets from
model and nonmodel plants should be an effective first step toward
identification of homologs of important transcription factors in
diverse plant species.
Key words: comparative genomics, EST, protein database, transcription factor