Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae
Introns have typically been discovered in an ad hoc fashion: introns are found as a gene is characterized for other reasons. As complete eukaryotic genome sequences become available, better methods for predicting RNA processing signals in raw sequence will be necessary in order to discover genes and predict their expression. Here we present a catalog of 228 yeast introns, arrived at through a combination of bioinformatic and molecular analysis. Introns annotated in the Saccharomyces Genome Database (SGD) were evaluated, questionable introns were removed after failing a test for splicing in vivo, and known introns absent from the SGD annotation were added. A novel branchpoint sequence, AAUUAAC, was identified within an annotated intron that lacks a six-of-seven match to the highly conserved branchpoint consensus UACUAAC. Analysis of the database corroborates many conclusions about pre-mRNA substrate requirements for splicing derived from experimental studies, but indicates that splicing in yeast may not be as rigidly determined by splice-site conservation as had previously been thought. Using this database and a molecular technique that directly displays the lariat intron products of spliced transcripts (intron display), we suggest that the current set of 228 introns is still not complete, and that additional intron-containing genes remain to be discovered in yeast. The database can be accessed at http://www.cse.ucsc.edu/research/compbio/yeast_introns.html.(Received September 14 1998)
(Revised October 9 1998)
(Accepted October 20 1998)
Key Words: branchpoint; hidden Markov model; intron database; intron display; splice site; splicing.
c1 Reprint requests to: Manuel Ares, Jr., Center for the Molecular Biology of RNA, Sinsheimer Laboratories, University of California–Santa Cruz, Santa Cruz, California 95064, USA; e-mail: [email protected].