Molecular Vision 2025; 31:486-500
<http://www.molvis.org/molvis/v31/486>
Received 24 May 2025 |
Accepted 28 November 2025 |
Published 01 December 2025
Bushra Alayed,1,2 Danah Albuainain,1,3 Salina Siddiqui,1,4 Weijia Li,1 Ummey Hany,1 Seema Anand,4 Chris F. Inglehearn,1 Christopher M. Watson,1,5 Manir Ali1
1Division of Molecular Medicine, Leeds Institute of Medical Research, St. James's University Hospital, University of Leeds, Leeds, UK; 2Department of Medical Laboratories, College of Applied Medical Sciences, Qassim University, Buraydah, Saudi Arabia; 3Department of Anatomy, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia; 4The Eye Department, St. James's University Hospital, Leeds, UK; 5North East and Yorkshire Genomic Laboratory Hub, Central Lab, St James's University Hospital, Leeds Teaching Hospitals NHS Trust, Leeds, UK
Correspondence to: Manir Ali, Division of Molecular Medicine, Leeds Institute of Medical Research, St. James's University Hospital, University of Leeds, Leeds, UK; email: m.ali@leeds.ac.uk
Purpose: A trinucleotide repeat expansion in TCF4 is thought to cause Fuchs endothelial corneal dystrophy (FECD) in ~70% of European patients. In addition, strong evidence exists for the involvement of rare variants in COL8A2 and SLC4A11 in a small number of FECD cases, and more controversially, it has been suggested that variants in ZEB1, AGBL1, and LOXHD1 may also be involved. We screened patients without a TCF4 repeat expansion for causative variants in the other candidate FECD genes.
Methods: Genomic DNA from blood was genotyped for expansion of the CTG18.1 repeat in intron 2 of TCF4 using short-tandem repeat PCR, followed by triplet-repeat primed PCR (STR/TP-PCR). Single-molecule molecular inversion probes (smMIPs) were designed to amplify the coding exons and splice recognition sites of FECD candidate genes COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1 using MIPGEN, and the libraries generated were sequenced on a NextSeq 2000. FASTQ-formatted sequence reads were aligned to the human reference genome with MIPVAR, and identified variants were annotated using Annovar. Rare potentially pathogenic variants (minor allele frequency ≤0.01 for assumed dominant inheritance, combined annotation-dependent depletion ≥15) were confirmed by Sanger sequencing and interpreted according to American College of Medical Genetics and Genomics criteria using the Franklin by Genoox interface.
Results: Analysis of 114 FECD cases by STR/TP-PCR stratified the patients into FECD expansion-negative cases that had <50 trinucleotide repeats on both alleles at the CTG18.1 locus (n = 33 probands and three additional family members) and FECD expansion-positive (n = 78) cases with at least one allele harboring ≥50 repeats in size. All 36 expansion negative cases were then analyzed by smMIP targeted capture and short-read sequencing of the five other genes implicated in FECD causation. For comparison, two control groups were similarly analyzed: a subset of 29 of the expansion-positive cases whose FECD was assumed to be caused by the repeat expansion and 29 expansion-negative unaffected individuals. Across all groups, 13 variants passed filtration criteria: 1 in COL8A2, 2 in SLC4A11, 4 in AGBL1, and 6 in LOXHD1. No variants were identified in ZEB1. Eight of the variants identified were found in seven FECD expansion-negative cases, five in four FECD expansion-positive cases, and three in two non-FECD controls, with three variants appearing in both expansion-positive and expansion-negative patient groups. Statistical analysis indicated no significant enrichment of variants in expansion-negative cases compared to the other two groups (p = 0.7612 and 0.3275). Only one variant (LOXHD1 NM_144612.7, c.5545G>A, p.(Gly1849Arg)), found in an expansion-negative case, was classified as likely pathogenic, while others were classed as variant of unknown significance (n = 4), likely benign (n = 4), or benign (n = 4).
Conclusions: smMIPs were used for targeted screening of candidate genes implicated in FECD and proved a versatile, economic approach for prescreening before whole-exome or genome sequencing. This study confirmed the well-documented enrichment of the TCF4 repeat expansion in cases over controls but found a paucity of evidence for the involvement of variants in COL8A2, SLC4A11, ZEB1, AGBL1, or LOXHD1 in this set of expansion-negative cases, implying knowledge of the causes of FECD remains incomplete.
Fuchs endothelial corneal dystrophy (FECD) is a progressive, bilateral eye disease that primarily affects the corneal endothelium, the monolayer of cells in the innermost part of the cornea, and occurs in 1 in 25 individuals over 40 years of age [1,2]. The condition is marked by the presence of “guttae,” which are drop-like excrescences that protrude posteriorly from the Descemet membrane, and by gradual loss of endothelial cells that affects the barrier function of the endothelium, causing hydration of the cornea due to fluid entering from the anterior chamber [3]. This leads to edema, corneal clouding, blurred vision, glare, and, if left untreated, blindness. Endothelial transplantation is the preferred method of treatment, with FECD the main cause of corneal transplantation worldwide [4].
A trinucleotide repeat expansion of ≥50 copies in intron 2 of the gene encoding transcription factor 4 (TCF4, OMIM *602272, #613267) accounts for over 70% of Caucasian cases with FECD [5–7]. The repeat expansion causes disease in corneal endothelial cells by forming RNA nuclear foci that sequester RNA-binding proteins, leading to global missplicing [8–10]. Although much less common than the TCF4 expanded repeat sequence, heterozygous variants in collagen type VIII alpha 2 chain (COL8A2, OMIM *120252, #136800) cause earlier-onset FECD in the 20s [11,12]. COL8A2 is expressed by corneal endothelial cells and is one subunit of a heterotrimer that makes up short-chain collagen fibrils and forms part of the hexagonal lattice structure in the Descemet membrane [13]. A knock-in mutant mouse model that has similar features to the human disease showed that the predominant effect of the mutation on corneal endothelium was activation of the unfolded protein response, leading to endoplasmic reticulum stress and apoptosis [14]. Classical FECD with onset in the 40s can also be caused by heterozygous mutations in solute carrier family 4 member 11 (SLC4A11, OMIM *610206, #613268) [15,16]. SLC4A11, which is expressed in corneal endothelial cells, is a transmembrane-bound sodium-coupled borate transporter that is required for intracellular boron homeostasis, cell growth, and proliferation [17]. Biallelic variants in SLC4A11 cause the corneal condition, congenital hereditary endothelial dystrophy type 2 (CHED2) [18]. Coexpression studies have shown that SLC4A11 is dimeric, and variants that cause disease are retained intracellularly and can be partially rescued by wild-type SLC4A11 for CHED2-causing recessive variants, but not for FECD-causing dominant variants [19].
There have also been studies suggesting a causal role for rare heterozygous variants in zinc finger E-box binding homeobox 1 (ZEB1, OMIM *189909, #613270) [20,21], ATP/GTP binding protein like 1 (AGBL1, OMIM *615496, #615523) [22,23], and lipoxygenase homology polycystin/lipoxygenase/alpha-toxin domain 1 (LOXHD1, OMIM *613072) [24] in FECD. ZEB1, also called transcription factor 8 (TCF8), is a transcriptional repressor of the epithelial cell marker E-cadherin, thereby contributing to the epithelial-mesenchymal cell transition [25]. ZEB1 is expressed in the cornea, and dominant variants have previously been shown to cause the related eye condition posterior polymorphous corneal dystrophy, characterized by abnormal changes in the corneal endothelium and Descemet’s membrane, which result in cell differentiation and the presence of multiple layers of epithelial-like cells [26,27]. AGBL1, also called cytosolic carboxypeptidase 4, encodes a glutamate decarboxylase that catalyzes the deglutamylation of polyglutamylated proteins [28]. Functional analysis suggested that AGBL1 interacts with TCF4, and that disease-causing variants diminish that interaction [22]. LOXHD1 is predominantly expressed in the hair cells of the inner ear, and biallelic variants cause progressive hearing loss in humans [29]. Immunohistochemical analysis of mouse sections suggested that there is expression, particularly in the corneal epithelium [22]. However, transcriptome analyses of normal [30–32] and FECD-derived corneal endothelium [33] noted the absence of AGBL1 and LOXHD1 transcripts. To date, there remains significant doubt as to the involvement of variants in ZEB1, AGBL1, and LOXHD1 in causing FECD, and further independent familial studies, which are currently lacking, and functional analyses are required [34].
Single-molecule molecular inversion probes (smMIPs), also sometimes called padlock probes, are linear oligonucleotides that contain a common DNA backbone with extension and ligation arms at each end (see Figure 1A). The two arms provide dual recognition of specific sequences to target the region of interest. The gap between the two arms is filled by the action of DNA polymerase, and the hybridized sequences are then circularized by DNA ligase. This circular DNA is PCR amplified using the universal primers attached to the smMIP backbone. Thousands of probes can be pooled together without interfering with each other due to their combined intrinsic specificity. Molecular inversion probes were first used for high-throughput genotyping with microarrays [35], while the smMIP adaptation was developed for the detection of low-frequency variants in mixed cell populations [36]. smMIPs have unique barcoded indices added by PCR to permit pooling and then subsequent deconvolution of multiple patient samples in a single sequencing experiment [37]. smMIPs are an efficient and cost-effective technique for variant scanning; numerous genes can be sequenced concurrently, they require a low starting mass of DNA (200 ng), and generated libraries can be multiplexed [37–39].
Here, we report results from a custom smMIP panel designed to capture the coding sequence and splice recognition sites of five candidate genes, COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1, with the last three being large multiexon genes, in which variants have been suggested to be associated with FECD. The reagent was used to screen cases that did not have a repeat expansion in intron 2 of TCF4, in an attempt to identify the genetic causes of FECD in these cases.
Patients with FECD, along with their affected family members and age-, sex-, and ethnically matched non-FECD controls, were recruited in the Eye Department at St. James’s University Hospital in Leeds, UK. All patients with FECD were ascertained from outpatient or operating theater lists and identified as having endothelial dystrophies in the electronic records. They had all undergone detailed slit-lamp examination and specular microscopy at the time of listing and satisfied the criteria of FECD grade 1 or above, according to the modified Krachmer grading scale. Informed consent was obtained using a process approved by the Leeds East Research Ethics Committee (Reference Number: 17/YH/0032) and following the principles of the Declaration of Helsinki. The cohort consisted of Caucasians aged 50 or older at recruitment and was not indicative of disease onset. Non-FECD controls were recruited from cataract clinics and, before surgery, had their corneal endothelium inspected using a specular microscope to exclude FECD. A total of 3 ml of peripheral blood was sampled by venipuncture into standard EDTA blood collection tubes (BD Biosciences, Wokingham, UK) for genomic DNA extraction using established methods.
Genotyping assays for measuring the size of the CTG18.1 repeat sequence have been described elsewhere [6]. Briefly, short tandem repeat (STR) genotyping was initially performed on genomic DNA, followed by triplet-repeat PCR (TP-PCR) on those patients in whom both alleles could not be identified using the STR assay, to confirm the presence or absence of an allele expansion. The oligonucleotide primer sequences used were P1 (FAM-AAT CCA AAC CGC CTT CCA AGT) and P2 (CAA AAC TTC CGA AAG CCA TTT CT) for the STR assay and primers P1, P3 (TAC GCA TCC CAG TTT GAG ACG), and P4 (TAC GCA TCC CAG TTT GAG ACG CAG CAG CAG CAG CAG), in a ratio of 1:1:0.3, for the TP-PCR assay. For the STR assay, 95 °C for 2 min was followed by 30 PCR cycles of 94 °C denaturation for 30 s, 61 °C annealing for 30 s, and 72 °C extension for 30 s, then a final extension step of 72 °C for 5 min. For TP-PCR, conditions were 40 cycles of 94 °C for 30 s, 60 °C for 45 s, and 72 °C for 2 min. PCR products were resolved on an ABI 3130xl Genetic Analyzer and analyzed with GeneMapper version 4.0 (Thermo Fisher Scientific, Waltham, MA).
To identify candidate genes in which protein-coding nucleotide variants have been implicated in causing FECD, a keyword search for “FECD” was performed in the Online Mendelian Inheritance in Man (OMIM) database (accessed July 22, 2021). The “Phenotypic Series” tab was selected to identify six candidate genes: COL8A2, ZEB1, AGBL1, LOXHD1, TCF4, and SLC4A11. Of these, the TCF4 variant is an intronic, trinucleotide repeat expansion that does not alter the protein-coding sequence. Hence, for smMIP design, only the protein-coding exons and splice recognition sites of the other five genes were selected by downloading the hg19 version of human reference sequences as a bed (browser extensible data) file from the UCSC genome browser.
Oligonucleotide probes were designed from genomic coordinates described in the bed file to amplify a specific target sequence between 95 and 115 nucleotides in length, using the MIPGEN program [40]. All the probes were visualized on the Integrative Genomics Viewer (IGV; v.2.16.2) [41], and a single probe was selected to target each sequence on either DNA strand. Each probe overlapped with the adjacent probe with a minimum of 10 nucleotides to ensure that no gaps were present in the target sequence. Probe selection was based on high logistic scores from MIPGEN, with probes scoring≥0.7 preferred for the panel design. However, where that was not possible, probes with lower logistic scores were chosen.
Probes, which were around 80 nucleotides in length, were synthesized at a 100-nmol scale in 96-well formatted plates (Integrated DNA Technologies, Leuven, Belgium). Probes were initially pooled at an equimolar concentration by combining 5 μl of each probe to generate a smMIP megapool. Then, 25 μl of the megapool was phosphorylated with 3 μl 10x T4 ligase buffer containing 10 mM ATP, 1 μl T4 polynucleotide kinase (New England Biolabs, Hitchin, UK), and 1 μl nuclease-free water. The smMIP megapool was then incubated at 37 °C for 45 min, followed by 65 °C for 20 min, in a thermocycler before subsequent use.
The smMIP-based targeted sequencing workflow is shown in Figure 1. Following each optimization run using control genomic DNA, the volume of each probe in the smMIP megapool was adjusted according to average read depth, so that probes with a lower read depth between 10 and 99 were increased in volume, and higher read depth probes over 600 were decreased for subsequent runs. Any probes with a read depth below 10 were increased in volume if they had a favorable logistic score of ≥0.7 or redesigned if they had a lower logistic score.
For each subject, 200 ng genomic DNA was hybridized against the phosphorylated smMIP megapool so that the final reaction contained
a ratio of 800 smMIP copies for each DNA molecule. Briefly, 25 μl capturing reaction was set up containing genomic DNA in
10 μl volume, 4 μl diluted smMIP megapool (10−4 dilution), 2.5 μl Ampligase 10x reaction buffer (Cambio Limited, Cambridge, UK), 0.32 μL dNTP (0.25 mM stock; New England
Biolabs), 0.32 μl Hemo KlenTaq (10 U/μl; New England Biolabs), 0.2 μl Ampligase (5 U/μl; Cambio Limited), and 7.66 μl nuclease-free
water. This mixture was incubated at 95 °C for 3 min and 65 °C for 22 h. This was followed by chilling on ice and adding 2 μl
exonuclease master mix consisting of 0.5 μl Exonuclease I (20,000 U/ml; New England Biolabs), 0.5 μl Exonuclease III (100,000
U/ml; New England Biolabs), 0.2 μlAmpligase 10x reaction buffer, and 0.8 μl nuclease-free water. The reaction was mixed and
incubated at 37 °C for 45 min and 95 °C for 2 min. Following exonuclease treatment, 10 μl of the exonuclease-treated capturing
reaction was mixed with 12.5 μl Q5 Hot Start High Fidelity 2x master mix (New England Biolabs) and 1.25 μl of each of the
forward (dAAT GAT ACG GCG ACC ACC GAG ATC TAC ACA TAC GAG ATC CGT A
To generate a single, multiplexed smMIP captured library, 5 ng of each individual library was combined before sequencing. Depending on the number of samples in the pooled library and the depth of coverage required, the library was sequenced using either a MiSeq (Illumina) or NextSeq 2000 (Illumina) instrument, generating paired-end 151-bp reads according to the manufacturer’s instructions.
Raw data were demultiplexed and converted to FASTQ format using BCL Convert (v.3.9.3), then processed using MIPVAR (v.0.1.0); MIP arms were removed and sequence reads aligned to human reference genome build hg19 using the Burrow-Wheeler Aligner (v.0.7.12). Picard (v.1.119) was used to identify and remove PCR duplicates (read pairs with the same mapping coordinates). The Genome Analysis Toolkit HaplotypeCaller (v.3.7.0) was used to identify and record nonreference bases in a variant call format file [42]. Each patient’s variant call format was annotated with population allele frequency data and information relevant to pathogenicity using Annovar [43]. To enrich for rare, potentially pathogenic variants, only those with a minor allele frequency below 1% in the Exome Aggregation Consortium (v.0.3) and a combined annotation-dependent depletion (v.1.3) score greater than 15 were retained.
Variants were interpreted according to the American College of Medical Genetics and Genomics (ACMG) criteria using the Franklin by Genoox website [44]. The Genome Aggregation Database (gnomAD; v.4.1) was used to identify variant allele frequencies. Splicing predictions were obtained using SpliceAI. Variant pathogenicity data were obtained from ClinVar. Evolutionary conservation across multiple vertebrate species was assessed using the multiple-protein-sequence alignment resource, the constraint-based alignment tool (COBALT), available on the NCBI website. Copy number variant (CNV) analysis was performed using ExomeDepth [45]. Briefly, samples to be compared were sequenced at the same time on the same machine to prevent confounding batch effects. For each sample, within the FECD expansion-negative, FECD expansion-positive, and non-FECD control categories, the total number of sequencing reads in the binary alignment map file was recorded and aligned in ascending order. Ten samples from each category, with similar total sequencing reads, were compared in batches of 30, using the ExomeDepth commands. The program compared the observed reads for each exon in the test sample, one at a time, against the expected reads from the remaining samples in that batch. The output results were documented in an Excel (Microsoft, Redmond, WA) file. The Bayes factor score indicated the likelihood of the CNV being present in the tested sample, and any with a score greater than 5.0 were prioritized. Read alignments supporting identified variants were visualized using IGV.
Fisher’s exact test was performed in GraphPad (GraphPad Software, La Jolla, CA). The test was used to compare categorical data in a 2 × 2 contingency table at the 95% significance level.
Primers were designed to generate an amplification product spanning the variant of interest, using Primer3 (v.4.1.0). Each PCR was performed in a 10-μl reaction volume using 40 ng genomic DNA with the primer pair and HotShot Diamond master mix (Clent Life Science, Stourbridge, UK) according to the supplier’s instructions. An aliquot of the PCR was treated with ExoSAP-IT (Thermo Fisher Scientific), followed by incubation at 37 °C for 30 min and 80 °C for 15 min. The treated PCR was Sanger sequenced by an external service provider (Azenta Life Sciences, Manchester, UK). Chromatograms were visualized using Sequence Analysis software (v.5.2; Thermo Fisher Scientific).
The final pool of the target capture reagent consisted of 380 smMIPs that covered the coding sequences and splice recognition signals of five genes, COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1 (Table 1), in which variants had previously been implicated in causing FECD. This rebalanced pool was derived following three optimization runs on the MiSeq, using five non-FECD, genomic DNA samples to evaluate the average read depth of each probe (Figure 2). Following the third optimization run, an average read depth of over 50 reads was achieved by 95.7% (360/376) of probes and over 10 reads by 99.5% (374/376) of probes when covering the targeted regions for the five control samples. Regions with an average read depth below 50 are shown in Appendix 1.
Genomic DNA from 114 FECD cases and 29 non-FECD controls were genotyped for the CTG18.1 repeat expansion in intron 2 of TCF4 using the STR/TP-PCR assay. Alleles with fewer than 50 repeats at the CTG18.1 locus were measured using the STR assay. Any patients in whom both alleles could not be identified this way were then tested using the TP-PCR assay to confirm the presence or absence of expansions greater than 50 repeats. FECD cases with at least one allele greater than 50 repeats were considered expansion positive, and those without an expanded allele were classed as expansion negative. This analysis stratified the patients into FECD expansion-negative (n = 33 probands and three additional affected family members) and FECD expansion-positive (n = 78) cases and confirmed the absence of TCF4 repeat expansions in all controls. The cohort demographics following STR/TP-PCR genotyping are summarized in Table 2, and the individual patients’ genotyping results are presented in Appendix 2, Appendix 3, and Appendix 4. Briefly, the age range and gender ratios between the FECD cases and non-FECD controls were similar and well matched.
All 36 expansion-negative cases were then analyzed by smMIP targeted capture and short-read sequencing of the five other genes implicated in FECD causation. For comparison, two control groups were also screened, one composed of a subset of 29 of the expansion-positive cases and the other composed of the 29 expansion-negative unaffected individuals sampled via cataract clinics. These 94 samples were screened for variants in the FECD candidate genes using the smMIP capture reagent. Following preparation of libraries and pooling of each sample, the mixture was sequenced on a NextSeq 2000. After demultiplexing and bioinformatics processing, the data were analyzed and filtered to identify single-nucleotide variants with a minor allele frequency <1% in the Exome Aggregation Consortium and a combined annotation-dependent depletion score >15.
Eight variants, confirmed by Sanger sequencing (Appendix 5), were identified in seven FECD expansion-negative cases (Table 3A). Variant NM_005202.4: c.7G>A, p.(Gly3Arg) in COL8A2 was classed as benign by ACMG criteria. Two variants were identified in AGBL1, NM_152336.4: c.806C>T, p.(Pro269Leu) and c.3323+2dup, predicted to be benign and a variant of unknown significance (VUS), respectively. The c.3323+2dup variant, which was present in patient 589, was homozygous in six individuals in the European (non-Finnish) general population in gnomAD. There were also five variants identified in LOXHD1, NM_01145472.3: c.3340G>A, p.(Gly1114Arg); NM_144612.7: c.5545G>A, p.(Gly1849Arg); c.4504C>T, p.(Arg1502Trp); c.3269G>A, p.(Arg1090Gln); and c.1570C>T, p.(Arg524Cys). Of these, the c.4504C>T variant found in patient 817, classed as a VUS by ACMG criteria, affects an arginine residue that is not evolutionarily conserved among vertebrates (data not shown). The c.5545G>A variant in patient 596, which is classed as likely pathogenic according to ACMG criteria, is predicted to create a new splice acceptor site (Splice AI = 0.79). This patient also harbored a second LOXHD1 variant, c.1570C>T, p.(Arg524Cys), considered likely benign, although the phase could not be established. As recessive variants in LOXHD1 cause nonsyndromic, progressive hearing loss [29], patient notes were examined, revealing that the patient developed a hearing deficit in their 70s and required hearing aids. However, this is not uncommon at that age and may be unrelated. The remaining two LOXHD1 variants, c.3340G>A (NM_001145472.3) and c.3269G>A (NM_144612.7), were classed as benign. No rare variants in ZEB1 or SLC4A11 were identified in the FECD expansion-negative cohort.
Five rare variants were identified in four FECD expansion-positive cases (Table 3B). The two variants identified in AGBL1, NM_152336.4: c.806C>T, p.(Pro269Leu) and c.3323+2dup, which were classified as benign and VUS, respectively, according to ACMG criteria, were identified in expansion-positive patients 558 and 948, as well as in expansion-negative cases 758 and 589. Two variants were also identified in LOXHD1 in expansion-positive cases. One of these, a benign variant, NM_144612.7: c.3269G>A, p.(Arg1090Gln), was identified in expansion-positive patient 778 and also in expansion-negative case 765. The other variant, c.2998C>T, p.(Arg1000Trp), which was VUS by ACMG criteria, affects an arginine residue that is not evolutionarily conserved among vertebrates (data not shown). One variant in SLC4A11 was identified in expansion-positive case 958, NM_032034.4: c.2041G>A, p.(Ala681Thr), which was VUS by ACMG criteria. This variant, which affects an alanine residue in the protein, is not evolutionarily conserved among vertebrates (data not shown). Three rare variants were identified in 2 non-FECD controls (Table 3C) and were likely benign according to ACMG criteria. Two variants, both identified in AGBL1, NM_152336.4: c.1352C>A, p.(Ser451Tyr) and c.2457G>T, p.(Glu819Asp), were found in the same individual (patient 1149). The other variant in SLC4A11, NM_032034.4: c.1039C>T, p.(Arg347Trp) was identified in patient 1159.
A statistical comparison between the number of variants identified after filtration in each cohort highlighted the absence of variant enrichment in the FECD expansion-negative cases compared to the FECD expansion-positive cases (p = 0.7612) or when compared to the non-FECD controls (p = 0.3275; Table 4).
To look for copy number changes, the smMIP-generated targeted sequences were compared using ExomeDepth between FECD expansion-negative and FECD expansion-positive cases and non-FECD controls. The putative CNVs that were identified are presented in Appendix 6. All Bayesian factor scores were below 10, and none of the putative copy number changes could be distinguished from samples without the CNV following manual scrutiny in IGV, suggesting the putative CNV changes are likely to represent false-positive variant calls.
The 36 FECD expansion-negative cases included two families with multiple affected members. One consisted of three affected siblings (patients 572, 573, and 580), while the other was a sibling pair of affected cases (patients 807 and 818). All affected members of both families were tested and confirmed to be expansion negative. No rare variants passing filtration criteria or any shared haplotypes were found within these families in the genes screened.
This study describes the use of a smMIP reagent targeting the exons of five candidate genes, COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1, potentially implicated in causing FECD, to screen a cohort of patients who lack a repeat expansion in TCF4, together with suitable control groups. Targeting the coding exons of TCF4 in the smMIP reagent was not considered, as dominantly inherited Mendelian variants in TCF4 result in haploinsufficiency of the protein and cause Pitt-Hopkins syndrome, a severe, congenital neurologic condition with characteristic facial features [46,47] that is distinct from late-onset FECD. Targeting the intronic CTG18.1 trinucleotide repeat sequence in TCF4 was not attempted by smMIPs, since the oligonucleotide probes used in the current study were designed to amplify up to 115 nucleotides of the target sequence. In theory, if an appropriate probe is designed to span the repeat sequence, it would only allow for the detection of up to 38 copies of the trinucleotide repeat, assuming suitable software is used to analyze the output. However, this detection would only be similar in size to the STR assay results presented in the current study. This limitation in size detection makes the smMIP method unsuitable for detecting repeat expansions.
The smMIP reagent described here provided a comparatively low-cost, rapid screening strategy, with cost per sample approximately a fifth of that for whole-exome sequencing (WES). The method required only 200 ng of starting genomic DNA for library preparation, as it is based on PCR enrichment and scalable through the addition of a unique indexed barcode to allow pooling of multiple samples in a single sequencing run. The data generated required less data storage, were simpler to analyze than the exome sequence, and detected variants that were verified by Sanger sequencing using an aliquot of the original genomic DNA sample. Other studies have used smMIPs to screen candidate genes, including testing 10 genes in focal epilepsies [48], looking for ABCA4 variants in Stargardt’s disease [37,49], screening 18 genes in patients with congenital glycosylation disorders [50], testing 113 genes in patients with retinitis pigmentosa and Leber congenital amaurosis [51], and testing 19 genes in the molecular diagnosis of patients with amelogenesis imperfecta [52].
This approach aimed to assess the relative contributions of rare, predicted pathogenic, dominant variants in COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1 to FECD causation in cases that do not appear to result from a TCF4 repeat expansion, as well as compare variant frequencies in the expansion-negative FECD group with those in suitable control groups. Previous studies have supported the involvement of COL8A2 variants in early-onset FECD [11,12] and SLC4A11 variants in classical FECD [15,16]. In addition, a role has been suggested for dominant variants in ZEB1 [20,21], AGBL1 [22,23], and LOXHD1 [24] in causing FECD. However, the lack of multiple, independent genetic studies confirming findings in ZEB1, AGBL1, and LOXHD1 means the link between FECD and these genes remains unproven [34].
Screening COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1 in 36 expansion-negative FECD cases revealed eight rare variants that passed filtering criteria in seven of these patients. These consisted of one variant in COL8A2, two in AGBL1, and five in LOXHD1. Screening of the two control groups revealed five variants in four cases in the FECD expansion-positive group, as well as three variants in two cases in cataract patients, with three variants appearing in both the expansion-positive and expansion-negative FECD patient groups. Comparing the three groups by Fisher’s exact test indicated no significant enrichment of variants that passed the filtration criteria in expansion-negative cases over either of the other groups (p = 0.7612 and 0.3275). Furthermore, only one variant (LOXHD1 NM_144612.7, c.5545G>A, p.(Gly1849Arg)), found in an expansion-negative case, was classified as likely pathogenic. According to ACMG criteria, this variant is predicted to create a new splice acceptor site (Splice AI = 0.79) and is classified in the ClinVar database as being of uncertain significance. Other variants were classed as VUS (n = 4), likely benign (n = 4), or benign (n = 4), with no obvious clustering of VUS within any group.
The relatively small size of the cohort used in this study means that these findings should be interpreted with caution and cannot be taken to disprove the involvement of variants in any one gene. Nevertheless, these data are consistent with previous studies (reviewed by [34]) suggesting that pathogenic variants in ZEB1, AGBL1, and LOXHD1 are rarely a cause of nonfamilial FECD. These data also show that repeat expansions in TCF4, together with rare variants in COL8A2, SLC4A11, ZEB1, AGBL1, and LOXHD1, do not account for FECD in all cases, implying a gap in our knowledge of the genetic basis of FECD. Of note, the FECD expansion-negative cases included two small families with multiple affected members. The fact that expansion-negative FECD is consistent within families provides further support for the hypothesis that another genetic cause or causes exist for FECD, although environmental effects cannot be ruled out. Similar findings were reported in a recent study [53] that used WES in 128 expansion-negative cases to show that only approximately 10% (13/128) carried rare, potentially deleterious variants in known FECD candidate genes.
Further contributions to FECD susceptibility could be genetic or environmental. Functional validation of the VUSs identified in this study may be an option to clarify their significance, although additional Mendelian variants in coding or noncoding regions may yet be discovered, particularly by studying familial cases, which are more likely to highlight highly penetrant alleles in novel genes. However, a polygenic form of FECD is also possible. Variants with an allele frequency greater than 1% in the general population were excluded during filtering in this study, but genome-wide association studies (GWASs) of FECD have proved revealing. Since the original discovery through GWASs of the single-nucleotide polymorphism (SNP) rs613872 on chromosome 18 in an intron of TCF4 and its connection with FECD [54], two further GWASs [55,56] have highlighted SNPs at 11 novel loci: KANK4, LAMC1, LINC00970/ATP1B1, SSBP2, THSD7A, LAMB1, PIDD1, RORA, HS3ST3B1, LAMA5, and COL18A1. WES or whole-genome sequencing of expansion-negative cases could identify rare Mendelian variants in these genes, while a GWAS using only expansion-negative FECD may help to clarify the relative contributions of these loci more fully and also offer the possibility of polygenic risk assessment. Alternatively, the versatility of the smMIP reagent allows the addition of specifically designed oligonucleotide probes spanning the GWAS SNPs as targets in the megapool. Likewise, as novel gene findings are reported, tiled oligonucleotides corresponding to the new targets could be included in the smMIP reagent.
A recent study identified several environmental risk factors contributing to FECD severity or age at onset of symptoms, including obesity, diabetes, and smoking [57]. Also, studies using mouse models have shown that increased corneal endothelial cell loss in females in response to ultraviolet light was driven in part by increased production of estrogen metabolites that caused DNA damage, which may explain the increased incidence of FECD in females [58,59]. Further studies are required to investigate whether any of these modulating factors could contribute to FECD onset in expansion-negative cases or protect against disease in individuals who have the expansion but do not manifest disease symptoms.
In summary, we have designed a smMIP-targeted capture reagent for variant screening of known gene candidates implicated in FECD causation. Although some DNA sequence variants were identified by this reagent, they were not significantly enriched in expansion-negative FECD cases compared with controls, suggesting that FECD in this group may have another cause or causes. Further work is therefore required to identify the cause of disease in the expansion-negative cases to account for the missing heritability in FECD.
Appendix 1. Regions with an average read depth below 50 following optimisation run 3.
Appendix 4. STR/TP-PCR genotyping of the CTG18.1 locus in 29 non-FECD controls.
Appendix 5. Sanger sequence verification of variant candidates.
The authors wish to thank the patients who were involved in this study. Author contributions: BA contributed to study design, acquired, analyzed and interpreted data and wrote the first draft of the manuscript. DA contributed to study design, interpreted data and commented on the manuscript draft. SS acquired and analyzed data and commented on the manuscript draft. WL analyzed data and commented on the manuscript draft. UH contributed to study design and commented on the manuscript draft. SA acquired data and commented on the manuscript draft. CFI contributed to study design, interpreted data and critically revised the manuscript draft. CW contributed to study design, interpreted data and critically revised the manuscript draft. MA contributed to study conception and design, analyzed and interpreted data and critically revised the initial manuscript draft for important intellectual content. All authors read and approved the final manuscript. Funding: This study was funded by a Saudi Arabian Government Scholarship to BA and an MRC Clinical Research Training Fellowship to SS (grant number G1002002/1). DA was funded by a Saudi Arabian Government Scholarship and UH was funded by a Leeds Doctoral Scholarship during this work. Conflict of interest statement: All other authors declare no competing financial interests of relevance to the contents of this manuscript.