|Molecular Vision 2000;
Received 13 March 2000 | Accepted 5 May 2000 | Published 17 May 2000
The human gene for gS-crystallin: Alternative transcripts and expressed sequences from the first intron
Graeme Wistow,1 Liodvig
Sardarian,1 Weinu Gan,2 M. Keith Wyatt1
1Section on Molecular Structure and Function, National Eye Institute, National Institutes of Health, Bethesda, MD; 2NIH Intramural Sequencing Center, National Institutes of Health, Bethesda, MD
Correspondence to: Graeme Wistow, Ph.D., Chief, Section on Molecular Structure and Function, National Eye Institute, Bldg 6, Rm 331, National Institutes of Health, Bethesda, MD, 20892-2740; Phone: (301) 402-3452; FAX: (301) 496-0078; email: firstname.lastname@example.org
Purpose: gS-crystallins are major components of adult vertebrate lenses. Here we examine the population of gS transcripts in adult human lens and the structure of the human CRYGS genes.
Methods: Adult lens human transcripts were obtained from NEIBANK, an Expressed Sequence Tag (EST) analysis of human eye tissues. The human CRYGS gene was isolated as a PAC clone and sequenced by direct and PCR-based methods.
Results: As judged by EST frequency, gS is one of the most abundant transcripts in the adult human lens, ranking just behind bB2-, aB- and aA-crystallins. EST analysis reveals two transcript sizes resulting from alternative AATAAA and ATTAAA polyadenylation signals. In addition, one cDNA clone was found to contain a novel insert sequence that disrupted the open reading frame. Gene sequencing confirmed that this insert comes from intron 1 and is part of a sequence corresponding to a cluster of unidentified human transcripts in dbEST. Human and mouse gS gene proximal promoter sequences were compared and showed a high degree of evolutionary conservation, including consensus binding sites for transcription factors of the maf and SOX families.
Conclusions: The human CRYGS gene can give rise to at least two transcripts through alternative polyadenylation. A minor transcript results from alternative splicing into sequences in intron 1. These sequences form part of a transcription unit (Mys) expressed in several non-lens tissues. The identity and function Mys of is not yet known, however, the cryptic splicing of CRYGS could produce a defective protein product, with potentially deleterious results for the adult human lens.
The major families of crystallins in the lenses of humans (and most other mammals) are the a-, b-, and g-crystallins [1-4]. The b- and g-crystallins both belong to a superfamily whose membership extends to stress proteins in bacteria and simple eukaryotes and to vertebrate proteins that may have roles in control in cyto-architecture [1,5]. Although b- and g-crystallins are expressed at much higher levels in lens than elsewhere, both are also expressed at lower levels in other tissues [6-8]. This may reflect retention of a (so far unknown) non-lens function predating their ancestral gene recruitment to the lens.
There are two groups of g-crystallins in mammals. The best-studied group consists of six genes (A-F), which form a single cluster in primate and rodent genomes [1,9,10]. These genes are expressed early in development, beginning in the primary fiber cells and constitute a major fraction of the lens nucleus. Distinct from these is gS-crystallin, sufficiently different in properties that it was originally known as bS-crystallin [1,4,9,11]. Its expression begins late in lens development, so that it replaces the embryonic g-crystallins in the secondary fiber cells of the adult lens [4,12]. The gene for gS has been sequenced from bovine and murine genomes [7,11]. In mouse, crygs is the locus for a cataract, Opj [7,13; unpublished data]. Here we describe an analysis of human gS, based on expressed sequence tag (EST) studies and genomic cloning.
Cloning and Sequencing
Multiple cDNA sequences for human gS were obtained during expressed sequence tag (EST) analysis of a cDNA library made from adult human lens as part of the NEIBANK project, to be described in detail elsewhere. Clones were subjected to 5'- and some to 3'-sequencing at the NIH Intramural Sequencing Center (NISC). One clone was picked for complete sequence. Full insert cDNA sequencing was performed by a primer-walking sequencing strategy until the sequence of both strands of the cDNA had been determined. The sequence was edited and assembled using the program Sequencher (Gene Codes, Ann Arbor, MI).
To isolate the human CRYGS gene, two primers, GGCCCCATTCATGTCATTACTCCACAATGC (humgs3) and CCTAGTGGAGGCCAGTATAAGATTCAGATC (humgs5), were designed from the 3'-UTR region of the cDNA sequence. These were tested on human genomic DNA and then supplied to Genome Systems (St. Louis, MO) to identify PAC clones from a human genomic DNA library.
The CRYGS gene was sequenced directly from the PAC clone template using primers derived from the cDNA sequence, and by PCR of gene fragments, followed by subcloning using the Invitrogen (Carlsbad, CA) pCR2.1 TA cloning system and sequencing. Primer sequences are available on request. Sequencing was carried out in-house using a Beckman CEQ 2000 capillary sequencer (Beckman Coulter, Fullerton, CA), following manufacturer's protocols and under contract at Bioserve Biotechnologies, Laurel MD, using Applied Bios stems protocols.
Fragments of the gene, including the large first intron, were amplified from PAC DNA using the Elongase (Life Technologies, Gaithersburg, MD) system for long range PCR. These were used as sequence templates and to estimate intron size.
Sequence analysis was performed on the desktop using programs of the DNASTAR package (Madison, WI). Sequence databases were searched using BLAST programs  through the Internet at the National Center for Biotechnology Information (NCBI). Consensus transcription factor binding sites were examined using the TRANSFAC database .
Results & Discussion
ESTs for human gS
In an analysis of over 2000 cDNA clones from adult human lens, clones for gS ranked fourth in abundance, behind those for bB2, aB and aA, representing about 2% of all transcripts observed. One clone containing the full coding sequence (CDS) was completely sequenced (GenBank accession AF161703). The NEIBANK EST sequencing strategy focuses on the 5'- ends of the clones, but for 3 clones the 3'- end was also obtained. Two of these contained the same 3'- end, resulting from use of a canonical AATAAA polyadenylation signal. The third sequence was 161 bp longer, running on past this signal to an alternative 3'- end derived from the common variant polyadenylation signal, ATTAAA [16,17] (Figure 1).
In addition to evidence for alternative polyadenylation, one gS clone was found to contain an insertion of 118 bp (Figure 1). This insertion disrupted the open reading frame (ORF) of gS and, if translated, would produce a short polypeptide containing the N-terminal "arm" of gS, but lacking characteristic gmotifs (Figure 2). The source of this insertion was investigated by examination of the CRYGS gene sequence.
The CRYGS gene
The human CRYGS gene was first mapped broadly to chromosome 3 . Subsequently this position was refined by RH mapping at Genethon to D3S1553-D3S1580 (sts-L36869), towards the telomeric end of chromosome 3q. This was confirmed using primers designed from the 3'-UTR of our cDNA sequence, giving the closest link to the genetic marker D3S1571 (J. Fingert, personal communication). This mapping served to eliminate gS, for which retinal expression has been observed in mouse , as a candidate for OPA1 . This mapping was also confirmed by others, who showed that CRYGS is distal to marker WI9695, similarly eliminating the gene from candidacy for an autosomal dominant congenital cataract .
A genomic clone for CRYGS was obtained by PCR screening of a Genome Systems arrayed PAC genomic library [clone address PAC 196(D10)]. The gene was sequenced by a combination of direct sequence and PCR amplification (GenBank accessions AF242197 and AF242198). As expected, the structure of the gene was similar to that of it's mouse  and bovine  orthologs (Figure 3). The alternative polyadenylation signals seen in the EST clones were confirmed in the genomic sequence (Figure 4). As in other mammals, the second intron of the human gene for gS is relatively small (less than 500 bp) while the first intron is much larger (more than 4 kbp). The first intron size was estimated by PCR, but sequence was limited to regions containing sequences related to alternative transcripts.
A transcription unit in the first intron
The unusual insertion sequence seen in one EST clone was located in the large first intron of the CRYGS gene. The sequence was located just over 1 kb 5' to exon 2 of the CRYGS gene, flanked by consensus AG/GT exon/intron splice junctions (Figure 4). The insert is clearly the result of alternative splicing into a CRYGS transcript, however its biological significance is not clear. It may represent the "accidental" use of cryptic splice junctions. Only a single copy of this splice variant has been seen so far. No judgment of absolute levels can therefore be made, although it could presumably be as high as 2% (one out of a total of 43 clones) of the abundant CRYGS transcripts. At this level it has the potential to contribute a significant amount of the variant polypeptide in the adult and aging lens.
Surprisingly, the cryptic alternative exon is represented in dbEST in another form. It forms part of the sequence of a group of ESTs from various adult and fetal tissues, represented in a human Unigene cluster (Hs.134126, Figure 5). These ESTs can be grouped into a contiguous consensus with no evidence for exon/intron structure, ORF or recognizable polyadenylation signals. None of the sequences in the Unigene cluster contain any sequence overlapping with exons of CRYGS. In other words, Hs.134126 appears to represent a separate transcription unit, a "mystery" gene (Mys), located in intron 1 of CRYGS. The alternative exon in the CRYGS EST is a small piece of this Mys sequence.
Alignment of the Mys sequences in Hs.134126 shows that they all terminate at a position corresponding to a stretch of As in the CRYGS intron sequence. This suggests that their presence in cDNA libraries results from oligo(dT) priming of RNA transcripts at this sequence, rather than from an authentic poly(A) tail. It also suggests that Mys RNA corresponds to the same DNA strand as the CRYGS gene and that it is transcribed in the same direction. A search of dbEST reveals a single, non-overlapping EST, (AI807504) which is not included in Hs.134126 but which is also derived from intron 1 of CRYGS. Possibly this EST represents the authentic 3'- end of the Mys RNA transcript (Figure 4). An alternative explanation, that the ESTs from intron 1 simply represent fragments of genomic DNA, seems unlikely in view of their abundance and their clustering. Given the present size of dbEST, the chances are remote that so many coincidental fragments could arise from the same part of the genome.
ESTs for Mys show no evidence for lens preference, deriving from various tissues including lung and heart. A trivial explanation for Mys sequences is that they simply represent primary transcripts of CRYGS or excised intron 1 following CRYGS splicing. If this was so, it would be reasonable to expect them to be found in association with higher levels of authentic, spliced gS sequences, but these are not seen in the source cDNA libraries. The alternative is that Mys is a transcription unit distinct from the gS structural gene and expressed with different tissue-preference. Certainly there is no equivalent evidence for expression of intron sequences from other b- or g-crystallin genes in non-lens tissues
So far we have been unable to gain further insight into Mys. Attempts to visualize Mys transcripts by Northern blot have not been successful; clearly the abundance in dbEST suggests that Mys RNA is present at low levels in several tissues and it is not unprecedented for low abundance RNAs to be difficult to detect by this method. Its apparent lack of features associated with typical pol II gene transcripts (ORF, exons/introns, poly(A) signal, poly(A) tail) also raises the possibility that Mys is not a protein-coding gene and might instead be a transcript from another polymerase system.
Whatever the function, if any, of Mys, its existence has implication for homologous recombination "gene knockout" (KO), experiments for this and other genes. In the case of the gS gene, database searches have not yet revealed a mouse homologue of Mys and limited sequence of the mouse crygs first intron has so far shown no similarity to human Mys sequences, however the possibility remains that deletion of the first intron of crygs in mouse could also result in loss of a function distinct from that of the crystallin structural gene. This also illustrates a general problem for design of deletions in other genes. Large deletions may unwittingly affect overlapping or nearby genes. Even a gene as small as that for gS can have an unexpectedly crowded locus. Indeed, apart from Mys, we have evidence that in the mouse genome, another gene lies within 2 kb of the crygs transcription start site (unpublished).
Conservation of Crygs Promoter Sequences
Sequence determination of the 5'- end of CRYGS allowed comparison of proximal promoter sequences for gS genes from two species (Figure 5). This approach, sometimes called phylogenetic footprinting, has previously been fruitful in revealing functionally important sites in crystallin gene promoters (see for example references [21,22]). Indeed, alignment of equivalent sequences from human and mouse genes  showed a high degree of sequence conservation, almost 80% for non-transcribed sequences (Figure 5). Thus the promoters are conserved at a level similar to that expected for CDS between two mammals. In particular some elements that match consensus binding sites for transcription factors  are conserved in both sequence and position between human and mouse.
About 200 bp upstream of the transcription start site there is a consensus maf-response element (MARE). MAREs have been shown to be important in other crystallin genes for lens-specific expression [23,24]. In particular, the guinea pig z-crystallin gene contains a MARE (also at about -200 bp) that acts synergistically with Pax6 for high-level gene expression . Furthermore, recent studies have shown that genetic deletion of the MARE binding factor c-maf severely disrupts lens fiber cell formation in KO experiments in mice [25-27]. Another maf-related element, a consensus sequence for NF-E2 (a complex including a small maf subunit ) binding, is also conserved between human and mouse. Another group of transcription factors, the SOX (SRY-box) proteins, are also important for crystallin gene expression and lens development [29-31]. As shown in Figure 5, one of the blocks of conserved sequence in human and mouse gS proximal promoter regions is similar to an SRY-family binding site. Several other interesting sites, including consensus homeobox and AP-1 binding sites are also present in the proximal promoter, as indicated in Figure 5.
Although gS is a relatively minor component of the transcripts in young mammalian lenses, the targets of most studies, it is much more significant in the adult lens. The human CRYGS gene is highly conserved in structure and sequence when compared with that of its mouse homolog. This extends to striking conservation of elements in the proximal promoter that match binding sites for factors with special significance for lens development and crystallin gene expression. The gene also shows some surprises. Alternative polyadenylation sites are present, although these would have no effect on the resultant protein product. More unusual is the existence of an alternative splice into a cryptic exon located in the large first intron. The splice would not produce a functional crystallin, but might give rise to a short polypeptide. It remains to be seen whether this has any consequences, but it is possible that accumulation of a "junk" polypeptide could have deleterious consequences for the aging lens. The cryptic exon is part of a block of sequence represented by ESTs in cDNA libraries of non-lens tissues, suggesting that the first intron of human CRYGS is actively transcribed in the absence of significant expression of the crystallin itself.
We thank John Fingert and Edwin Stone for RH mapping of CRYGS.
1. Wistow GJ. Molecular biology and evolution of crystallins: gene recruitment and multifunctional proteins in the eye lens. Austin (TX): R.G. Landes; 1995.
2. Wistow G. Lens crystallins: gene recruitment and evolutionary dynamism. Trends Biochem Sci 1993; 18:301-6.
3. de Jong WW. Evolution of lens and crystallins. In: Bloemendal H, editor. Molecular and cellular biology of the eye lens. New York: Wiley; 1981. p. 221-78.
4. Harding JJ, Crabbe MJC. The lens: development, proteins, metabolism and cataract. In: Davson H, editor. The eye. Vol 1B. Orlando (FL): Academic Press; 1984. p. 207-492.
5. Ray ME, Wistow G, Su YA, Meltzer PS, Trent JM. AIM1, a novel non-lens member of the betagamma-crystallin superfamily, is associated with the control of tumorigenicity in human malignant melanoma. Proc Natl Acad Sci U S A 1997; 94:3229-34.
6. Head MW, Peter A, Clayton RM. Evidence for the extralenticular expression of members of the beta-crystallin gene family in the chick and a comparison with delta-crystallin during differentiation and transdifferentiation. Differentiation 1991; 48:147-56.
7. Sinha D, Esumi N, Jaworski C, Kozak CA, Pierce E, Wistow G. Cloning and mapping the mouse Crygs gene and non-lens expression of [gamma]S-crystallin. Mol Vis 1998; 4:8 <http://www.molvis.org/molvis/v4/a8/>.
8. Jones SE, Jomary C, Grist J, Makwana J, Neal MJ. Retinal expression of gamma-crystallins in the mouse. Invest Ophthalmol Vis Sci 1999; 40:3017-20.
9. Lubsen NH, Aarts HJ, Schoenmakers JG. The evolution of lenticular proteins: the beta- and gamma-crystallin super gene family. Prog Biophys Mol Biol 1988; 51:47-76.
10. van Rens GL, de Jong WW, Bloemendal H. A superfamily in the mammalian eye lens: the beta/gamma-crystallins. Mol Biol Rep 1992; 16:1-10.
11. van Rens GL, Raats JM, Driessen HP, Oldenburg M, Wijnen JT, Khan PM, de Jong WW, Bloemendal H. Structure of the bovine eye lens gamma s-crystallin gene (formerly beta s). Gene 1989; 78:225-33.
12. Jaworski C, Wistow G. LP2, a differentiation-associated lipid-binding protein expressed in bovine lens. Biochem J 1996; 320:49-54.
13. Everett CA, Glenister PH, Taylor DM, Lyon MF, Kratochvilova-Loester J, Favor J. Mapping of six dominant cataract genes in the mouse. Genomics 1994; 20:429-34.
14. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990; 215:403-10.
15. Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV, Ignatieva EV, Ananko EA, Podkolodnaya OA, Kolpakov FA, Podkolodny NL, Kolchanov NA. Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res 1998; 26:362-7.
16. Swimmer C, Shenk T. Selection of sequence elements that substitute for the standard AATAAA motif which signals 3' processing and polyadenylation of late simian virus 40 mRNAs. Nucleic Acids Res 1985; 13:8053-63.
17. Graber JH, Cantor CR, Mohr SC, Smith TF. In silico detection of control signals: mRNA 3'-end-processing sequences in diverse species. Proc Natl Acad Sci U S A 1999; 96:14055-60.
18. Wijnen JT, Oldenburg M, Bloemendal H, Meera Khan P. GS(gS)-crystallin (CRYGS) assignment to chromosome 3. Cytogenet Cell Genet 1989; 51:1108.
19. Brown J Jr, Fingert JH, Taylor CM, Lake M, Sheffield VC, Stone EM. Clinical and genetic analysis of a family affected with dominant optic atrophy. Arch Ophthalmol 1997; 115:95-9.
20. Kramer PL, LaMorticella D, Schilling K, Billingslea AM, Weleber RG, Litt M. A new locus for autosomal dominant congenital cataracts maps to chromosome 3. Invest Ophthalmol Vis Sci 2000; 41:36-9.
21. Wistow G, Graham C. The duck gene for alpha B-crystallin shows evolutionary conservation of discrete promoter elements but lacks heat and osmotic stress response. Biochim Biophys Acta 1995; 1263:105-13.
22. Gopal-Srivastava R, Cvekl A, Piatigorsky J. Pax-6 and alphaB-crystallin/small heat shock protein gene regulation in the murine lens. Interaction with the lens-specific regions, LSR1 and LSR2. J Biol Chem 1996; 271:23029-36.
23. Ogino H, Yasuda K. Induction of lens differentiation by activation of a bZIP transcription factor, L-Maf. Science 1998; 280:115-8.
24. Sharon-Friling R, Richardson J, Sperbeck S, Lee D, Rauchman M, Maas R, Swaroop A, Wistow G. Lens-specific gene recruitment of zeta-crystallin through Pax6, Nrl-Maf, and brain suppressor sites. Mol Cell Biol 1998; 18:2067-76.
25. Kim JI, Li T, Ho IC, Grusby MJ, Glimcher LH. Requirement for the c-Maf transcription factor in crystallin gene regulation and lens development. Proc Natl Acad Sci U S A 1999; 96:3781-5.
26. Kawauchi S, Takahashi S, Nakajima O, Ogino H, Morita M, Nishizawa M, Yasuda K, Yamamoto M. Regulation of lens fiber cell differentiation by transcription factor c-Maf. J Biol Chem 1999; 274:19254-60.
27. Ring BZ, Cordes SP, Overbeek PA, Barsh GS. Regulation of mouse lens fiber cell development and differentiation by the Maf gene. Development 2000; 127:307-17.
28. Andrews NC. The NF-E2 transcription factor. Int J Biochem Cell Biol 1998; 30:429-32.
29. Kamachi Y, Sockanathan S, Liu Q, Breitman M, Lovell-Badge R, Kondoh H. Involvement of SOX proteins in lens-specific activation of crystallin genes. EMBO J 1995; 14:3510-9.
30. Kamachi Y, Uchikawa M, Collignon J, Lovell-Badge R, Kondoh H. Involvement of Sox1, 2 and 3 in the early and subsequent molecular events of lens induction. Development 1998; 125:2521-32.
31. Nishiguchi S, Wood H, Kondoh H, Lovell-Badge R, Episkopou V. Sox1 directly regulates the gamma-crystallin genes and is essential for lens development in mice. Genes Dev 1998; 12:776-81.