RNA



Genome-wide bioinformatic and molecular analysis of introns in Saccharomyces cerevisiae


MARC  SPINGOLA a1, LESLIE  GRATE a2, DAVID  HAUSSLER a2 and MANUEL  ARES  JR. a1c1
a1 Center for the Molecular Biology of RNA, Sinsheimer Laboratories, University of California–Santa Cruz, Santa Cruz, California 95064, USA
a2 Department of Computer and Information Sciences, Applied Sciences Building, University of California–Santa Cruz, Santa Cruz, California 95064, USA

Abstract

Introns have typically been discovered in an ad hoc fashion: introns are found as a gene is characterized for other reasons. As complete eukaryotic genome sequences become available, better methods for predicting RNA processing signals in raw sequence will be necessary in order to discover genes and predict their expression. Here we present a catalog of 228 yeast introns, arrived at through a combination of bioinformatic and molecular analysis. Introns annotated in the Saccharomyces Genome Database (SGD) were evaluated, questionable introns were removed after failing a test for splicing in vivo, and known introns absent from the SGD annotation were added. A novel branchpoint sequence, AAUUAAC, was identified within an annotated intron that lacks a six-of-seven match to the highly conserved branchpoint consensus UACUAAC. Analysis of the database corroborates many conclusions about pre-mRNA substrate requirements for splicing derived from experimental studies, but indicates that splicing in yeast may not be as rigidly determined by splice-site conservation as had previously been thought. Using this database and a molecular technique that directly displays the lariat intron products of spliced transcripts (intron display), we suggest that the current set of 228 introns is still not complete, and that additional intron-containing genes remain to be discovered in yeast. The database can be accessed at http://www.cse.ucsc.edu/research/compbio/yeast_introns.html.

(Received September 14 1998)
(Revised October 9 1998)
(Accepted October 20 1998)


Key Words: branchpoint; hidden Markov model; intron database; intron display; splice site; splicing.

Correspondence:
c1 Reprint requests to: Manuel Ares, Jr., Center for the Molecular Biology of RNA, Sinsheimer Laboratories, University of California–Santa Cruz, Santa Cruz, California 95064, USA; e-mail: ares@biology.ucsc.edu.