Molecular Vision 2021; 27:95-106
Received 21 October 2020 | Accepted 16 March 2021 | Published 18 March 2021
Citation (for Endnote)
Gang Zou,1,2 Tao Zhang,2 Xuesen Cheng,2 Austin D. Igelman,3 Jun Wang,2 Xinye Qian,2 Shangyi Fu,2 Keqing Wang,2 Robert K. Koenekoop,4 Gerald A. Fishman,5 Paul Yang,3 Yumei Li,2 Mark E. Pennesi,3 Rui Chen2
1Department of Ophthalmology, Ningxia Eye Hospital, People’s Hospital of Ningxia Hui Autonomous Region, First Affiliated Hospital of Northwest University for Nationalities, Ningxia Clinical Research Center on Diseases of Blindness in Eye, Yinchuan, China; 2Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas; 3Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; 4Department of Paediatric Surgery, Human Genetics and Adult Ophthalmology, MUHC, Montréal, Quebec, Canada; 5Pangere Center for Inherited Retinal Diseases, The Chicago Lighthouse, Chicago, IL
Correspondence to: Rui Chen, Human Genome Sequencing Center Baylor College of Medicine, Room N1519, One Baylor Plaza, Houston, Texas, 77030; Phone: (713) 798-5194; FAX: (713) 798-5741; email: firstname.lastname@example.org
Purpose: Despite the extensive use of next-generation sequencing (NGS) technology to identify disease-causing genomic variations, a major gap in our understanding of Mendelian diseases is the unidentified molecular lesion in a significant portion of patients. For inherited retinal degenerations (IRDs), although currently close to 300 disease-associated genes have been identified, the mutations in approximately one-third of patients remain unknown. With mounting evidence that noncoding mutations might contribute significantly to disease burden, we aimed to systematically investigate the contributions of noncoding regions in the genome to IRDs.
Methods: In this study, we focused on RPGRIP1, which has been linked to various IRD phenotypes, including Leber congenital amaurosis (LCA), retinitis pigmentosa (RP), and macular dystrophy (MD). As several noncoding mutant alleles have been reported in RPGRIP1, and we observed that the mutation carrier frequency of RPGRIP1 is higher in patient cohorts with unsolved IRDs, we hypothesized that mutations in the noncoding regions of RPGRIP1 might be a significant contributor to pathogenicity. To test this hypothesis, we performed whole-genome sequencing (WGS) for 25 patients with unassigned IRD who carry a single mutation in RPGRIP1.
Results: Three noncoding variants in RPGRIP1, including a 2,890 bp deletion and two deep-intronic variants (c.2710+233G>A and c.1468–263G>C), were identified as putative second hits of RPGRIP1 in three patients with LCA. The mutant alleles were validated with direct sequencing or in vitro assays.
Conclusions: The results highlight the significance of the contribution of noncoding pathogenic variants to unsolved IRD cases.
The most common cause of hereditary blindness around the world is inherited retinal degenerations (IRDs), a group of Mendelian disorders that are clinically and genetically heterogeneous. In the past decade, advancements in next-generation sequencing technologies have greatly improved the molecular diagnostic rate for patients with IRDs [1-4]. However, currently about one-third of the underlying pathogenic mutations in patients with IRDs remain unassigned, representing one of the major gaps in the field . Recent studies have shown that mutations outside coding exons could be a significant contributor to the disease. For example, one of the most frequently observed mutations in patients with Leber congenital amaurosis (LCA) is an intronic mutation (c.2991+1655A→G) in CEP290 (Gene ID: 80184; OMIM: 610142), which creates a common splice-donor site in an intron and leads to the inclusion of a cryptic exon [5-8]. Similarly, multiple deep-intronic mutations that can lead to cryptic mRNA splicing have been identified in ABCA4 (Gene ID: 24; OMIM: 601691) [9-17]. Several of these near- or deep‐intronic variants in ABCA4 were shown to lead to a frame shift that results in the formation of a premature stop codon, leading to a subsequent, predicted protein change or disruption of mRNA splicing . In addition to deep-intronic mutations, chromosomal structure mutations have been observed in patients with IRDs. For example, deletions and duplications are frequently observed in USH2A (Gene ID: 7399; OMIM: 608400) [19,20]. Therefore, systematically screening mutations across the genomic loci might reveal pathogenic mutations for a significant portion of patients who remain unassigned after exon sequencing technology.
A commonly mutated gene in patients with IRDs is retinitis pigmentosa GTPase regulator interaction protein 1 (RPGRIP1; Gene ID: 57096; OMIM: 605446) . Mutations in RPGRIP1 have been associated with a range of inherited retinal diseases, such as retinitis pigmentosa (RP), and cone rod dystrophy (CRD) [22-26]. RPGRIP1, which is located on the long arm of the chromosome 14 (14q11.2), is a large gene that spans 63 Kb and contains 24 exons, which encode a 1,286 amino acid protein (nucleotide accession number: NM_020366.3). RPGRIP1 plays an important role in the connecting cilium of photoreceptor cells, which is critical for controlling protein trafficking between the inner segment and the outer segment of the photoreceptors. Directly binding to RPGR and SPATA7, RPGRIP1 functions in the RPGR complex, which is important for proper localization of other cilia transition zone complexes, such as the transport of the nephronophthisis (NPHP) protein complex to the connecting cilium in photoreceptor cells [27-31].
Currently, various types of likely pathogenic alleles in RPGRIP1 have been observed, including missense, splicing, deletion, duplication, and frameshift alterations in human gene mutation database (HGMD). Rare, noncoding, and complex mutations in RPGRIP1 have also been reported, including a homozygous deletion in exon 17 of the gene . Additionally, structural variations and deep-intronic mutations have been reported in RPGRIP1, suggesting that these complex and noncoding mutations may contribute significantly to the mutation load .
To assess the contribution of mutations in RPGRIP1 that are missed by coding exon capture sequencing, we examined mutations in RPGRIP1 in a cohort of 762 patients with RP and 171 patients with LCA whose mutations have not been found. Among them, we identified 15 patients with RP and ten patients with LCA carrying one likely pathogenic mutation in the coding exons of RPGRIP1, (i.e., patients with one hit in RPGRIP1). The carrier mutation frequency in RPGRIP1 in the general population is calculated based on the Genome Aggregation Database (gnomAD) for LCA and RP as described previously . Compared to the control population, the number of patients with one hit in RPGRIP1 was significantly higher in the patient cohort with LCA than expected (expected 1.35%, observed 5.85%, p=1.26E-04), while the number of patients in the patient cohort with RP with one hit in RPGRIP1 was not significantly higher (expected 1.95%, observed 1.97%, p=0.52). To identify putative noncoding mutations in these patients with one hit in RPGRIP1, a combination of short and 10X genomics linked read whole genome sequencing (lrWGS) was performed. As a result, three noncoding variants in RPGRIP1, including one large deletion and two deep-intronic variants, were identified as putative mutations. The deletion allele spans 2,890 bp in length, uncovering exon 21 and resulting in a frameshift mutation and a premature stop codon. In vitro minigene splicing assay of the two deep-intronic variants (c.2710+233G>A and c.1468–263G>C) supported that these two variants affect proper splicing and lead to the inclusion of cryptic exons.
The study was approved by the Department of Molecular and Human Genetics, Baylor College of Medicine, and adhered to the Declaration of Helsinki and to the ARVO Statement on Human Subjects. Written informed consent was obtained from all individuals on whom genetic testing and further molecular evaluations were performed. The pedigree information of all individuals was obtained from Casey Eye Institute Oregon Health & Science University for genetic analysis and further molecular evaluation. Research Ethics Board (REB) approval was obtained by the McGill University Health Centre Research Institute (MUHC RI) ethics board. All patients in this study underwent clinical assessment by experienced ophthalmologists.
Blood was collected from each proband and their family members when available after informed consent was obtained. Venous blood samples were obtained from the probands and Genomic DNA was extracted. All DNA samples were stored at -80°C freezer. DNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). All patient DNA underwent whole-exome sequencing (WES) and WGS and was further examined at the Human Genome Sequencing Center, Baylor College of Medicine. The Sanger sequencing primer design was as follows: MEP-305: RPGRIP1_112_F: 5′-GTG CCT TTA CTG CCT CTT GC-3′, R: 5′-CAG CAT TAC AGA GCT TGA AAA A-3′; MEP-318: RPGRIP1_113_F: 5′-GTG CAC AGG GAA AAT CCA CT-3′, R: 5′-GCT AAG GTA CTG GAG AAA AAT GC-3′; RKK-665: RKK665_F: 5′-TCC TCC TGG TAT CCC TGA TG-3′, R: 5′-CCT GTG GGT CCA GGT CTA TT-3′.
The WGS data were processed using a pipeline modified from our previous WES data analysis pipeline. Briefly, next-generation sequencing (NGS) sequencing reads were aligned to the human genome assembly (hg19) with Burrows-Wheeler aligner (BWA) . Single nucleotide variants and small insertion-deletion variants (SNVs and indels) are identified using genome analysis toolkit (GATK), and structure variants (SVs) and copy number variants (CNVs) are identified using a set of bioinformatics tools, including CNVnator, DELLY, LUMPY, and MANTA. A population allele frequency threshold of 0.5% was applied to filter out common variants based on the allele frequency in the gnomAD database and the center’s internal whole genome sequencing databases of 50,000 people without eye diseases. For SNVs, variants that were mapped to the coding region were annotated with ANNOVAR and searched against the dbNSFP. The conservation of the remaining variants was calculated based on the genomic evolutionary rate profiling (GERP) score. The effect of the variants was predicted using combined annotation dependent depletion (CADD) [36,37]. The deleteriousness of SNVs was predicted using the latest CADD score (score cutoff = 20), which is derived based on more than 60 annotations, including Ensembl Variant Effect Predictor (VEP), The Encyclopedia of DNA Elements (ENCODE), multiple conservation and protein predictions, splicing prediction, and database for nonsynonymous SNPs' functional predictions (dbNSFP) . For SVs and CNVs, variants were annotated to the RefSeq gene database and filtered with the SV and CNV QC tool svtyper (score cutoff = 100) [39,40]. Raw bam files that contained candidate SVs and CNVs were further viewed manually through integrative genomics viewer (IGV) to rule out potential false positive calls from mapping errors or sequencing errors before experimental validation. To predict if the variants might affect splicing, SpliceAI was applied to all the WGS variants of the cases with RPGRIP1 with one hit (cutoff = 0.2) .
To assess whether the prioritized variants have an effect on splicing, we used an established minigene reporter assay, the RHCglo minigene . DNA fragments, including 500 bp flanking each side of the putative intronic mutation, were PCR amplified using genomics DNA from the corresponding patient as the template and cloned into the RHCglo minigene vector. PCR amplification consisted of: denaturation step at 95°C for 15 s followed by 40 cycles of 95°C for 30 s, 58°C for 30 s, and 72°C for 1 min/kb, and a final extension step at 72°C for 5 min. The impact of the variant on mRNA splicing was examined by transfecting the plasmid to the human embryonic kidney 293 (HEK293) cell line followed with reverse transcriptase (RT)–PCR as described previously . The HEK293 cell line was validated with short tandem repeat (STR; Appendix 1) profiling.
To identify patients with likely pathogenic mutations in RPGRIP1, we first analyzed WES data of 933 IRD cases, including 762 patients with RP and 171 patients with LCA, whose causal mutations were unknown. The analytic procedure for the WES data is shown in a flowchart in Figure 1. As a result, we identified 15 patients with RP and ten patients with LCA who carry a single pathogenic or likely pathogenic variant in RPGRIP1 (Table 1). Using the same criteria, we screened deleterious variants in RPGRIP1 in the gnomAD, which was used as the control population, to infer the background carrier frequency of RPGRIP1. Based on the inferred background carrier frequency of RPGRIP1, we deduced that the expected number of carriers with one hit of RPGRIP1 in the patient cohort with unsolved LCA was about two or three. However, we observed ten patients with LCA carrying a single likely pathogenic variant in RPGRIP1 in the cohort, indicating that the observed number of patients with one hit of RPGRIP1 in the patient cohort with unsolved LCA is significantly higher than expected (Table 2, binomial test, one sided, p=1.26E-04). Similarly, based on the background carrier frequency of RPGRIP1, the expected number of carriers with one hit of RPGRIP1 in the patient cohort with unsolved RP was about 15. However, there were 15 patients with RP carrying a single likely pathogenic variant in RPGRIP1 in the cohort, suggesting the observed number of carriers of mutations in RPGRIP1 in the patient cohort with unsolved RP was not significantly higher than expected (Table 2, binomial test, one sided, p=0.52). Overall, these results suggest that it is likely that some of these ten patients with LCA with one hit of RPGRIP1 might have a second pathogenic allele in RPGRIP1. To test this idea, WGS was performed to identify potential mutations in RPGRIP1 that were missed by previous WES. Candidate structural variants and deep-intronic cryptic splicing mutations were identified by analyzing the WGS data as described in the methods section (Table 3).
We identified in two patients with unsolved LCA two deep-intronic variants that are predicted to affect splicing . Each deep-intronic splicing variant was validated with Sanger sequencing, and the corresponding minigene gel band, which could be cleanly excised, was sequenced to confirm its composition. The RNA extracted from the HEK293 cells was used for RT–PCR. Both gel bands indicated that the mutants produced new bands in contrast to the wild-type patients, both of which are shown in Figure 2.
The first deep-intronic variant was c.2710+233G>A, which is located at chromosome 14 and base position 21,794,565 (hg19) and was predicted to result in the creation of a new splicing donor site. This variant has not been previously described and was found in a patient with LCA, MEP_305, who also carries a c.3793_3794insGAAA (p.(Val1265GlyfsTer19) frameshift mutation (Table 3). To confirm this prediction, intronic DNA fragments containing the variant or the wild-type sequence were cloned into the minigene vector. Both constructs were transfected into the HEK293 cell line and subjected to mRNA splicing assay. As shown in Figure 2, compared to the wild-type control, the variant showed a larger RT–PCR band in the variant construct, indicating the inclusion of a cryptic exon. Sequencing of the RT–PCR product indicated that the cryptic exon is 134 bp in length spanning chromosome 14 from the base positions 21,794,477 to 21,794,610 (Figure 2).
The second deep-intronic variant was c.1468–263G>C, identified in a patient with LCA, MEP_318, who carries the frameshift insertion c.934dupC (p.(Gln312ProfsTer9)). This intronic variant was described in a previous report , in which the variant was predicted to generate a novel splicing donor site. As shown in Figure 2, an extra RT–PCR band that was larger than that observed for the variant construct compared to the wild-type control in the minigene splicing test. Sequencing of the large RT–PCR band revealed that the cryptic exon is 120 bp in length, spanning the base positions 21,789,216 to 21,789,335 on chromosome 14.
Consistent with the molecular mutation in RPGRIP1, the clinical phenotypes of both affected individuals showed the typical LCA phenotype (Figure 3A,B). MEP_305 is female, who first presented at age 4 years with congenital nystagmus. At the age of 4 years, the best-corrected visual acuity (BCVA) of her right eye was 20/125, while the BCVA of her left eye was 20/200. At 7 years, her BCVA decreased to 20/400 for both eyes. Fundus examination disclosed moderate waxy pallor of the optic nerves, pigmentary mottling in the macula, as well as moderate vascular attenuation. The pigmentary changes inferiorly are secondary to laser for a Coats-like reaction that the patient developed (Figure 3A, top row). Fundus autofluorescence (FAF) showed peripheral hypo-AF and hyper-AF rings of the parafovea and midperiphery bilaterally (Figure 3A, second row). MEP_318 is male, who presented with congenital nystagmus, photophobia, and poor visual acuity since the age of 2 years. At the age of 2 years, his BCVA was estimated to be 20/1,000 for both eyes. At 8 years, his BCVA decreased to 20/1,600 for the right eye, whereas the left eye was light perception (LP) visual acuity. Fundus examination disclosed vascular attenuation, RPE atrophy with increased visibility of the choroidal vessels, fine granular pigmentation just outside the vessels, and yellow deposits in the periphery (Figure 3B, top row). FAF depicted perimacular hyper-AF ring bilaterally (Figure 3B, second row). Full-field electroretinography (ffERG) for both patients revealed severe cone and rod dysfunction (Figure 3A,B, bottom), and the two patients retained foveal structure on optical coherence tomography (OCT) imaging (Figure 3A,B, third row).
In addition, one deletion was identified in a patient with LCA (RKK_665) who also carries a c.2627A>G (p.(Asp876Gly)) missense mutation (Table 3). The deletion has not been reported previously. As shown in Figure 4A, reduced read coverage and discordant read mate pairs were observed. To confirm the deletion and determine the breakpoint, PCR was performed to amplify the genomic region of the mutant chromosome, and the PCR product was Sanger sequenced. As shown in Figure 4C, a PCR product was obtained using genomic DNA from the patient as the template. Sequencing of the PCR product indicated that the breakpoints are mapped at chromosome 14 and base positions 21,809,977 and 21,812,868, resulting in a deletion of 2,890 bp in length. As a result, the entire exon 21 of RPGRIP1 was deleted. As exon 21 is 193 bp, deletion of the exon would lead to a reading frameshift and likely trigger nonsense-mediated mRNA decay (NMD), resulting in a complete loss of function mutation.
The clinical phenotype of the proband (RKK_665) is a 44-year-old female with LCA. She first presented at age 4 years with congenital nystagmus, visual defects including poor night vision, and marked light sensitivity. Her BCVA decreased to 20/240 for both eyes. As shown in Figure 3C, fundus examination of both eyes showed optic disc pallor (mild) and diffuse retinal pigmentation and atrophy with arteriolar narrowing. OCT examination showed thinning of the retina and a small remaining subfoveal ellipsoid zone (EZ) in the right eye (Figure 3D). She did not complain of visual field defects, although her superior peripheral visual field showed defects (Figure 3E).
A significant proportion of patients with IRDs currently remain unexplained upon exon capture sequencing in the known IRD-associated genes. Noncoding mutations and structural variation of the disease genes have been shown to contribute to the disease burden. The carrier frequency in coding regions of some IRD genes is higher in patients with unsolved IRDs than expected. These observations suggest that these carriers are enriched with noncoding variants and structure mutations that are missed by current WES or panel sequencing. To test this hypothesis, we systematically investigated noncoding variants and structure variations in RPGRIP1, a gene with a high number of carriers in the unsolved patient cohort. WGS identified three patients with a second mutant allele in RPGRIP1, including two deep-intronic splicing mutations and a large deletion. One of the two deep-intronic splicing mutations identified in this study, c.1468–263G>C, was reported previously while the other, c.2710+233G>A, is novel . Although the identification of deep-intronic splicing variants has been challenging, the recently published SpliceAI appears to be effective in predicting such variants. In this study, two deep-intronic variants were predicted to result in donor splice-site gain and aberrant splicing with new exon gain. Both predictions were experimentally confirmed with the minigene test.
Despite improved detection of pathogenic variants with WGS, SVs remain difficult to identify due to limitations of short-read sequencing in identifying breakpoints. By combining multiple software predictions for SVs and CNVs, we identified a structural variation that causes an aberrant reading frame as the second pathogenic allele in RKK_665. By sequencing the PCR products of the large deletion and aligning it with the reference sequence, we validated the deletion with a breakpoint at c.3340_c.3533del2890. This finding indicates that deletions in RPGRIP1 could explain some unsolved one-hit RPGRIP1 cases and suggests screening of SVs may be necessary to explain the patients with unsolved IRDs.
After we identified pathogenic alleles in noncoding regions and SVs, one hit of RPGRIP1 remained enriched in the unsolved LCA cohort, implying that there may be second hits that remain undetected. In contrast, one hit of RPGRIP1 was not significantly more frequent in this RP cohort. This might be because mutations in RPGRIP1 account for fewer than 2% of patients with RP [43-49]. Overall, the study results indicate that it is important to thoroughly investigate SVs and noncoding variations to identify the missing mutations in unsolved cases with a higher priority for IRD genes with higher carrier frequency in coding regions than expected in patients with unsolved IRDs.
Appendix 1. STR analysis
We thank the patients and families for their enthusiastic participation. We thank Iris Chen for reading and editing the manuscript. This work is supported by the key research and development program Ningxia Hui Autonomous Region of China (2020BEG03044) to GZ, CIHR, Fighting Blindness Canada, MCH Foundation and Reseau de Vision to RKK, Retinal Research Foundation to RC. Dr. GAF acknowledges funding from the Pangere family. NIH K08EY026650 to PY. Supported by grant P30EY010572 from the National Institutes of Health (Bethesda, MD), and by unrestricted departmental funding from Research to Prevent Blindness (New York, NY).