HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins

J Mol Biol. 2000 Aug 4;301(1):173-90. doi: 10.1006/jmbi.2000.3837.

Abstract

We describe a hidden Markov model, HMMSTR, for general protein sequence based on the I-sites library of sequence-structure motifs. Unlike the linear hidden Markov models used to model individual protein families, HMMSTR has a highly branched topology and captures recurrent local features of protein sequences and structures that transcend protein family boundaries. The model extends the I-sites library by describing the adjacencies of different sequence-structure motifs as observed in the protein database and, by representing overlapping motifs in a much more compact form, achieves a great reduction in parameters. The HMM attributes a considerably higher probability to coding sequence than does an equivalent dipeptide model, predicts secondary structure with an accuracy of 74.3 %, backbone torsion angles better than any previously reported method and the structural context of beta strands and turns with an accuracy that should be useful for tertiary structure prediction.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Motifs
  • Computational Biology / methods*
  • Computer Simulation
  • Databases, Factual
  • Markov Chains*
  • Models, Molecular
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Proteins / genetics
  • Reproducibility of Results
  • Sequence Alignment

Substances

  • Proteins