2011; 17:827-843 <http://www.molvis.org/molvis/v17/a94>
Received 2 February 2011 | Accepted 22 March 2011 | Published 30 March 2011
1Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland; 2Basic Medical Sciences Program, WWAMI (Washington, Wyoming, Alaska, Montana, and Idaho), Washington State University, Spokane, WA; 3Department of Ophthalmology, Hospital Metropolitano, Quito, Ecuador; 4Signature Genomics, Spokane, WA
Correspondence to: Marzena Gajecka, Ph.D., Institute of Human Genetics, Polish Academy of Sciences, Strzeszynska 32, Poznan, 60-479, Poland; Phone: (061) 657-9160; FAX: (061) 823-3235; email: firstname.lastname@example.org
Purpose: Keratoconus (KTCN) is a non-inflammatory, usually bilateral disorder of the eye which results in the conical shape and the progressive thinning of the cornea. Several studies have suggested that genetic factors play a role in the etiology of the disease. Several loci were previously described as possible candidate regions for familial KTCN; however, no causative mutations in any genes have been identified for any of these loci. The purpose of this study was to evaluate role of the collagen genes collagen type IV, alpha-1 (COL4A1) and collagen type IV, alpha-2 (COL4A2) in KTCN in Ecuadorian families.
Methods: COL4A1 and COL4A2 in 15 Ecuadorian KTCN families were examined with polymerase chain reaction amplification, and direct sequencing of all exons, promoter and intron-exon junctions was performed.
Results: Screening of COL4A1 and COL4A2 revealed numerous alterations in coding and non-coding regions of both genes. We detected three missense substitutions in COL4A1: c.19G>C (Val7Leu), c.1663A>C (Thr555Pro), and c.4002A>C (Gln1334His). Five non-synonymous variants were identified in COL4A2: c.574G>T (Val192Phe), c.1550G>A (Arg517Lys), c.2048G>C (Gly683Ala), c.2102A>G (Lys701Arg), and c.2152C>T (Pro718Ser). None of the identified sequence variants completely segregated with the affected phenotype. The Gln1334His variant was possibly damaging to protein function and structure.
Conclusions: This is the first mutation screening of COL4A1 and COL4A2 genes in families with KTCN and linkage to a locus close to these genes. Analysis of COL4A1 and COL4A2 revealed no mutations indicating that other genes are involved in KTCN causation in Ecuadorian families.
Keratoconus (KTCN, OMIM 148300) is a non-inflammatory, usually bilateral disorder of the eye, characterized by progressive thinning and protrusion of the central cornea which results in altered refractive powers and loss of visual acuity . The prevalence of the disease is estimated to be 1 in 2,000 individuals, and is the most common ectatic disorder of the cornea . KTCN afflicts males and females in all ethnic groups . Signs and symptoms depend on the stage of disease, with the first signs usually appearing in the third decade of life [1,2]. The cause of KTCN is still unknown; both genetic and environmental factors seem to play a role in its etiology. Although most cases of KTCN are isolated, an association with many syndromes, such as Down syndrome , Ehlers-Danlos syndrome , and Leber congenital amaurosis  has been described. Furthermore, extensive studies have shown an association between KTCN and constant eye rubbing , contact lens wear , or atopy . Usually, KTCN is a sporadic disorder, but positive family history has been observed in 6%–8% of cases . An autosomal dominant inheritance pattern with reduced penetrance has been suggested in 90% of patients with familial KTCN [9,10].
Genomewide linkage analyses have indicated several loci involved in the etiology of familial KTCN at 16q22.3-q23.1 (KTCN2; OMIM 608932), 3p14-q13 (KTCN3; OMIM 608586), 2p24 (KTCN4; OMIM 609271), 1p36.23–36.21, 5q14.3-q21.1, 5q21.2, 5q32-q33, 8q13.1-q21.11, 9q34, 14q11.2, 14q24.3, 15q2.32, 15q22.33-q24.2, 17p13, and 20q12 [10-20]. However, no mutations in any genes at any of these loci have been associated with KTCN.
We have demonstrated an evidence of linkage to a novel locus at 13q32 . Collagen type IV, alpha-1 (COL4A1; OMIM 120130) and collagen type IV, alpha-2 (COL4A2; OMIM 120090) are mapped in close proximity to that locus. The COL4A1 and COL4A2 genes are organized in a head-to-head conformation . These gene pairs share a common promoter, and each gene is transcribed in opposite directions . The COL4A1 gene is placed on the minus strand and consists of 52 exons, while the COL4A2 gene is on the opposite strand and consists of 48 exons. They encode two of six collagen type IV chains – α1 and α2 (1,669 and 1,712 amino acids, respectively) – forming a heterotrimeric protein molecule of collagen type IV (α1α1α2), which is found in the structure of the basement membrane (BM) [22,23]. Each chain contains three domains: an NH2-terminal 7S domain, a major collagenous domain with Gly-X-Y repeats (the X position is frequently occupied by proline, whereas the Y position is often occupied by 4-hydroxyproline) and a non-collagenous domain (NC1) at the COOH-terminus. Repetitions of the Gly-X-Y motif determine the formation of the triple-helical structure of collagen .
Collagens are the major protein components of the human cornea, and several types of collagen, including collagen type IV, have been identified . Biochemical studies have revealed thinning of corneas from patients with KTCN, which may occur as a result of a reduced amount of total collagen proteins  and changes in collagen fibers orientation . Moreover, a cornea affected by KTCN contains defects in BM and alterations in the BM composition . The presence of collagen type IV in normal human cornea has remained unclear . Results from expression arrays have shown an expression of COL4A1 in transplant-quality human donor corneas  and a downregulation of COL4A1 in keratoconus corneas . Immunohistochemical studies have found collagen type IV α1/α2 chains in keratoconus corneas in large defect sites . In light of these results, we recognize COL4A1 and COL4A2 as candidate genes for KTCN.
The purpose of this study was to screen COL4A1 and COL4A2 genes and determine whether sequence variants in these genes are involved in the causation of KTCN in Ecuadorian families.
Twenty-three individuals from family KTCN-014, 25 affected individuals from other Ecuadorian families with KTCN, and 64 Ecuadorian control subjects were included in the study. The pedigrees of these families have been described elsewhere . All individuals were examined in the Hospital Metropolitano in Quito, Ecuador, undergoing a complete ophthalmic evaluation as previously described . The possible consequences of the study were explained and informed consent was obtained from all family members, according to the Declaration of Helsinki. Study protocol was approved by both the Institutional Review Board at Washington State University Spokane, Spokane, WA and Poznan University of Medical Sciences (Poland).
Oligonucleotide primers were designed to amplify all coding sequences and intron-exon junctions, promoter, and UTRs of both COL4A1 and COL4A2 (Table 1). PCR amplifications were performed using Taq DNA Polymerase (Fermentas Inc., Glen Burnie, MD). PCR products were purified with ExoSAP-IT® (USB Corporation, Cleveland, OH) or Montage® PCR Filter Units (Millipore, Jaffrey, NH) and sequenced using the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Inc. [ABI], Foster City, CA). Sequencing was visualized on an ABI PRISM® 3100 Genetic Analyzer (ABI) and a 3730xl DNA Analyzer (ABI). The DNA sequences of study subjects were compared with the reference sequences of COL4A1 and COL4A2 (GRCh37/hg19, GenBank accession numbers for the mRNA NM_001845.4 and NM_001846.2, respectively) using Sequencher® 4.1.4. Software (Gene Codes Corporation, Ann Arbor, MI).
PEDSTATS  was used to verify the structure of KTCN-014 family and identify potential Mendelian inconsistencies in the inheritance of single nucleotide polymorphisms (SNPs) in COL4A1 and COL4A2. For that region, to determine the full haplotypes inherited along with the substitutions occurring in affected individuals, a reconstruction of observed sequence variants was prepared using SimWalk2 [32,33]. Allele frequencies were set as equal. The location of genetic markers was determined on the basis of the Rutgers combined linkage-physical map of the human genome , either directly or by interpolation. Haplotype was generated with HaploPainter .
The difference in distribution of Gln1334His substitution between affected and unaffected individuals in family KTCN-014 was analyzed by Fisher's Exact Test for Count Data. Similarly, 25 affected individuals from the remaining KTCN families versus 64 Ecuadorian control individuals were compared using Fisher's Exact Test. The difference between the examined groups was considered significant if the value of probability (p) did not exceed 0.05.
The PolyPhen tool predicts which missense substitution affects the structure and function of protein, and uses Position-Specific Independent Counts software to assign profile scores. These scores are the likelihood of the occurrence of a given amino acid at a specific position, compared to the likelihood of this amino acid occurring at any position (background frequency) .
The SIFT analytic tool, on the basis of gene sequences homology, evaluates conserved positions, and calculates a score for the amino acid change at a particular position. A score of <0.05 is considered as pathogenic and has a phenotypic effect on protein structure .
The PMUT calculates the pathological significance of non-synonymous amino acid substitution using neural networks (NN). NN output >0.5 is considered to be deleterious . PANTHER estimates the likelihood of a particular amino acid’s change affecting protein function. On the basis of an alignment of evolutionarily related proteins, it generates the substitution Position-Specific Evolutionary Conservation (subPSEC). The subPSEC could achieve values from 0 (neutral) to about −10 (most likely to be deleterious). The value −3 is the cutoff point for functional significance, and corresponds to a Pdeleterious of 0.5. If the substitution occurs at a position not appearing in the multiple sequence alignment, a subPSEC score cannot be calculated and change is not likely to be pathogenic [39,40].
The SNAP tool predicts the functional consequences of exchanging amino acids using evolutionary conservation and structure/function relationships. The SNAP output shows prediction neutral or non-neutral, and the expected accuracy .
Forty eight members of 15 Ecuadorian families and 64 Ecuadorian control subjects were included in the study. Twenty-three individuals from family KTCN-014, two affected individuals from each of the families KTCN-011, 015, 019, 020, 021, 024, 025, 030, 031, 034, and 035, and one patient from each of KTCN-05, 013, and 017 were examined.
Screening of COL4A1 (NM_001845.4) coding regions revealed 12 sequence variants, three of which were amino acid substitutions: c.19G>C (Val7Leu), c.1663A>C (Thr555Pro), and c.4002A>C (Gln1334His). We identified one novel synonymous change, c.3693G>A (Thr1231Thr), and eight previously reported sequence variants: c.432T>A (Ala144Ala), c.1257T>C (Pro419Pro), c.1815T>C (Pro605Pro), c.2130G>A (Pro710Pro), c.3183G>A (Gly1061Gly), c.3189A>T (Arg1063Arg), c.4470C>T (Ala1490Ala), and c.4800C>T (Ser1600Ser). In the 5′ untranslated region (5′ UTR), one novel sequence variant, c.84+124T>A, was identified. In the 3′ untranslated region (3′ UTR), two previously reported variants, c.*587C>A and c.*975A>C, were detected.
Sequencing analyses of COL4A2 (NM_001846.2) coding regions revealed 13 previously reported sequence variants, including five non-synonymous substitutions: c.574G>T (Val192Phe), c.1550G>A (Arg517Lys), c.2048G>C (Gly683Ala), c.2102A>G (Lys701Arg), and c.2152C>T (Pro718Ser), and eight synonymous substitutions: c.297G>A (Thr99Thr), c.1008C>T (Pro336Pro), c.1095G>A (Pro365Pro), c.1179C>T (Ile393Ile), c.1488G>A (Pro496Pro), c.4089G>A (Ala1363Ala), c.4290T>C (Phe1430Phe), c.4515A>G (Pro1505Pro). In the 5′ UTR, five known nucleotide changes, c.-277A>C, c.-232C>G, c.-215C>T, c.-203T>C, and c.-133A>G, were identified. In the 3′ UTR, eight previously reported sequence variants, c.*76T>C, c.*101_*102del2, c.*417C>G, c.*541C>T, c.*557A>G, c.*650T>C, c.*663T>C, and c.*727G>C were detected.
Screening of exon/intron junctions in COL4A1 and COL4A2 revealed numerous sequence variants in the surrounding non-coding sequences, 71 and 86, respectively, including single nucleotide changes, insertions, and deletions. All screening results are summarized in Table 2.
The sequencing of the genomic region containing the common promoter of COL4A1 and COL4A2 revealed no sequence changes.
PolyPhen analyses of non-synonymous changes in COL4A1 and COL4A2 predicted that only the Gln1334His variant in COL4A1 was possibly damaging for protein function and structure (Table 3). The multiple sequence alignment of COL4A1 orthologs shows that the amino acid glutamine at position 1,334 is conserved throughout the analyzed species (Figure 1). Gln1334His substitution was observed more frequently in patients than in healthy individuals in family KTCN-014 (p=0.056). There was no difference in the c.4002A>C allele distribution between the analyzed affected individuals from the remaining KTCN families and the Ecuadorian control subjects (p=0.17).
The SIFT, PMUT, PANTHER, and SNAP analyses defined all missense amino acid substitutions in COL4A1 and COL4A2 as neutral/tolerated and lacking any effect on protein function. All prediction results are summarized in Table 3.
Haplotypes of sequence variants observed in family KTCN-014 are shown in Figure 2. The coding sequence variants in COL4A1 are surrounded by markers rs13260 and col4a1_snp2. Exons of COL4A2 are localized between rs35466678 and rs422733.
KTCN-014 consists of two family branches. Distinct haplotypes in the branches were identified (Figure 2). In the first one, initiated by parents KTCN-93 and KTCN-01, six subjects with KTCN had the same haplotype in the COL4A1 region, extending from rs13260 to col4a1_snp1. Three unaffected individuals, KTCN-13, KTCN-14, and KTCN-22, share that part of the haplotype with their affected relatives. One of four variants in this region, rs3742207, causes a change in the protein sequence, replacing Gln in position 1334 with His (Gln1334His). That haplotype region, from rs13260 to col4a1_snp1, represents a short fragment of the haplotype which covers the whole COL4A1 and COL4A2 sequence in KTCN-03, KTCN-05, KTCN-06, and KTCN-14. In addition, individuals KTCN-07, KTCN-09, KTCN-13, KTCN-22, and KTCN-23 share the rs874203-rs422733 region (Figure 2 – pink bars). For markers rs13260-col4a1_snp1, a different haplotype was observed in the second family branch, initiated by parents KTCN-92 and KTCN-16. This haplotype covered the entire length of the analyzed region, and was identified in all affected individuals and KTCN-21, whose phenotype was unknown. Subject KTCN-17 had the same allele pattern for markers s13260-col4a1_snp1, as individuals from the first branch of the family. However, in this case, analysis indicated that these markers are inherited from KTCN-92, who is unrelated to KTCN-93 and KTCN-01.
To our knowledge, this is the first report describing complete sequence analysis of the coding regions and the exon-intron boundaries of COL4A1 and COL4A2 in families with KTCN. Previous studies have revealed a correlation between KTCN development and histopathological alterations in the structure of the corneal stroma and basement membrane, including a loss of collagen concentration  and rearrangement of collagen fibers . Moreover, several types of collagen, including collagen type IV have been identified in the cornea , and COL4A1 and COL4A2 expression has been detected in the human cornea . Finally, we had mapped a locus for KTCN to 13q32, in close proximity of which COL4A1 and COL4A2 are localized . Given that information, we hypothesized that COL4A1 and COL4A2 genes are good candidates for causing KTCN in families with linkage to that locus.
Different studies have revealed several loci and a few candidate genes for familial KTCN. The first gene proposed as playing a significant role in KTCN pathogenesis was the VSX1 (visual system homeobox 1, OMIM 605020) gene. It was suggested that a few disease-causing mutations were present in this gene [43,44], but recent studies have not confirmed these findings [21,45-47]. Next, heterozygous genomic 7-bp deletion in intron 2 of SOD1 (superoxide dismutase 1; OMIM 147450) was identified in two families with KTCN [48,49]. In contrast, other studies have shown that mutations in this gene are not associated with KTCN pathogenesis [21,47]. Genetic analyses of COL4A3, COL4A4, COL8A1, and COL8A2 genes have revealed no pathogenic mutations in patients with KTCN, indicating that other genetic factors cause the disease [50-52].
We identified several single base pair substitutions in the coding regions of COL4A1 and COL4A2, including one novel heterozygous change, c.3693G>A in exon 42 of COL4A1. None of the detected alterations segregated fully with the affected phenotype in the analyzed members of the Ecuadorian KTCN families. Among the identified missense substitutions in COL4A1, one change, c.4002A>C (p. Gln1334His), was observed more frequently in KTCN patients than in healthy individuals in family KTCN-014. However, no significant statistical association of this change with familial disease could be proven (p=0.056), and no difference in the c.4002A>C allele distribution between the analyzed affected individuals from the remaining KTCN families and the Ecuadorian control subjects was discovered (p=0.17). To predict the impact of the substitutions on the structure and function of the protein, we used different tools. All identified missense substitutions in COL4A1 and COL4A2 were predicted by the SIFT, PMUT, PANTHER, and SNAP tools to have no effect, but PolyPhen defined the Gln1334His change in COL4A1 as possibly damaging. Glutamine at this position is highly conserved in different species. Moreover, this change is present in the collagenous domain of the α1(IV) chain with Gly-X-Y repeats, which plays a role in the assembly into a triple-helical structure of the protein . Replacement of the neutral residue (Gln) with the polar amino acid (His) at the Y position is likely to affect the protein structure. Nevertheless, further studies should be performed to determine the functional significance of this substitution.
To the best of our knowledge, no mutations in COL4A1 were associated with corneal disease. The spectrum of COL4A1-related disorders included porencephaly (OMIM 175780) [53-55], Hereditary Angiopathy with Nephropathy, Aneurysm and Muscle Cramps (HANAC; OMIM 611773) , and brain small vessel disease with hemorrhage (OMIM 607595) . Recent studies have also revealed an association between mutations in exon 29 of COL4A1 and Axenfeld-Rieger anomaly with leukoencephalopathy and stroke . In our study, none of the previously reported COL4A1 mutations were identified. The absence of these changes in patients with KTCN suggests that they are specific to the above-mentioned disorders only, and are not associated with KTCN in the tested families. To date, no mutations responsible for COL4A2-related human diseases have been reported.
Besides changes identified in the coding regions of COL4A1 and COL4A2, our study revealed numerous alterations in introns and UTRs of both genes, including single base pair substitutions, deletions, and insertions. Fourteen of these were novel and their clinical significance is not known. Each of the changes was observed in affected and healthy individuals in the tested families. Because important functional elements are located in non-coding regions of genes  and intronic alterations can result in a deleterious effect on pre-mRNA splicing , identification of these sequence variants could be non-accidental. Further research is needed to delineate the role of these sequence variants.
Recent studies have shown that a mouse with a mutation in a splice acceptor site of Col4a1 has ocular dysgenesis. The mutation results in a lack of exon 40 from mice’s transcripts and leads to the accumulation of mis-folded protein in the lens epithelial cells. Col4a1∆ex40 mice show optic nerve hypoplasia and anterior segment dysgenesis (ASD) including pigment dispersion, cataracts, and corneal opacifications . Splice acceptor sites are highly conserved regions in different species . We detected no alterations in the splice acceptor site in intron 39 of human COL4A1.
Extended genetic studies executed in families with KTCN have shown a high level of genetic heterogeneity . The presence of many putative loci supports the hypothesis that KTCN is an oligogeneic disease in which accumulation of sequence variants at several loci cause a specific KTCN haplotype and may trigger the phenotypic effect. The absence of mutations in COL4A1 and COL4A2 genes indicates that other genes are involved in KTCN pathogenesis in Ecuadorian families.
Supported by the Polish Ministry of Science and Higher Education, Grant NN 402097837. The authors thank Genomed Company (Warsaw, Poland) for support in sequencing service.