ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences

Proc Int Conf Intell Syst Mol Biol. 1999:138-48.

Abstract

One of the problems associated with the large-scale analysis of unannotated, low quality EST sequences is the detection of coding regions and the correction of frameshift errors that they often contain. We introduce a new type of hidden Markov model that explicitly deals with the possibility of errors in the sequence to analyze, and incorporates a method for correcting these errors. This model was implemented in an efficient and robust program, ESTScan. We show that ESTScan can detect and extract coding regions from low-quality sequences with high selectivity and sensitivity, and is able to accurately correct frameshift errors. In the framework of genome sequencing projects, ESTScan could become a very useful tool for gene discovery, for quality control, and for the assembly of contigs representing the coding regions of genes.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Base Sequence
  • DNA, Complementary / genetics
  • Exons
  • Expressed Sequence Tags*
  • Gene Library
  • Markov Chains
  • Molecular Sequence Data
  • Reading Frames
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Analysis, DNA*
  • Sequence Homology, Amino Acid
  • Software*

Substances

  • DNA, Complementary