Distribution and characterization of regulatory elements in the human genome

Genome Res. 2002 Dec;12(12):1827-36. doi: 10.1101/gr.606402.

Abstract

The regulation of transcription and subsequent gene splicing are crucial to correct gene expression. Although a number of regulatory sequences involved in both processes are known, it is not clear how general their functions are in the genomic context, nor how the regulatory regions are distributed throughout the genome. Here we study the distribution of known mutagenic elements within human introns and exons to deduce the properties of regions essential for splicing and transcription. We show that intronic splicing regulators are generally found close to the splice sites, but may be found as far as 200 nucleotides away from the splice junctions. Similarly, sequences important for splicing may be located as far as 125 nucleotides away from the junctions, within exons. We characterize several types of simple repetitive sequences and low-complexity regions that are overrepresented close to both intron ends and are likely to play important roles in the splicing process. We show that the first introns within most genes play a particularly important regulatory role that is most likely, however, to be involved in transcription control. We also study the distribution of two known regulatory motifs, the GGG trinucleotide and the CpG dinucleotide, and deduce their respective importance to splicing and transcription regulation.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Base Composition / genetics
  • Chromosome Mapping / methods*
  • DNA Transposable Elements / genetics
  • Databases, Genetic
  • Exons / genetics
  • Genome, Human*
  • Humans
  • Introns / genetics
  • Microsatellite Repeats / genetics
  • Models, Genetic
  • Polymorphism, Single Nucleotide / genetics
  • Regulatory Sequences, Nucleic Acid / genetics*
  • Short Interspersed Nucleotide Elements / genetics

Substances

  • DNA Transposable Elements