Molecular Vision 2019; 25:174-182
<http://www.molvis.org/molvis/v25/174>
Received 22 October 2018 |
Accepted 14 March 2019 |
Published 16 March 2019
Minh Thuan Nguyen Tran,1 Mohd Khairul Nizam Mohd Khalid,1 Alice Pébay,2,3,6 Anthony L. Cook,4 Helena H. Liang,2 Raymond C.B. Wong,2,3 Jamie E. Craig,5 Guei-Sheung Liu,1,3 Sandy S. Hung,2,3 Alex W. Hewitt1,2,3
The last two authors jointly supervised and contributed equally to this work.
1Menzies Institute for Medical Research, University of Tasmania, Tasmania, Australia; 2Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Victoria, Australia; 3Ophthalmology, Department of Surgery, University of Melbourne, Victoria, Australia; 4Wicking Dementia Research and Education Centre, University of Tasmania, Hobart, TAS 7000, Australia; 5Department of Ophthalmology, Flinders University, Flinders Medical Centre, Bedford Park, Australia; 6Department of Anatomy and Neuroscience, University of Melbourne, Victoria, Australia
Correspondence to: Alex Hewitt, Menzies Institute for Medical Research, University of Tasmania, Hobart 7000, Australia; Phone: +61 407359824; FAX: +61 3 6226 7704; email: hewitt.alex@gmail.com
Purpose: To evaluate the efficacy of using a CRISPR/Cas-mediated strategy to correct a common high-risk allele that is associated with age-related macular degeneration (AMD; rs1061170; NM_000186.3:c.1204T>C; NP_000177.2:p.His402Tyr) in the complement factor H (CFH) gene.
Methods: A human embryonic kidney cell line (HEK293A) was engineered to contain the pathogenic risk variant for AMD (HEK293A-CFH). Several different base editor constructs (BE3, SaBE3, SaKKH-BE3, VQR-BE3, and Target-AID) and their respective single-guide RNA (sgRNA) expression cassettes targeting either the pathogenic risk variant allele in the CFH locus or the LacZ gene, as a negative control, were evaluated head-to-head for the incidence of a cytosine-to-thymine nucleotide correction. The base editor construct that showed appreciable editing activity was selected for further assessment in which the base-edited region was subjected to next-generation deep sequencing to quantify on-target and off-target editing efficacy.
Results: The tandem use of the Target-AID base editor and its respective sgRNA demonstrated a base editing efficiency of facilitating a cytosine-to-thymine nucleotide correction in 21.5% of the total sequencing reads. Additionally, the incidence of insertions and deletions (indels) was detected in only 0.15% of the sequencing reads with virtually no off-target effects evident across the top 11 predicted off-target sites containing at least one cytosine in the activity window (n = 3, pooled amplicons).
Conclusions: CRISPR-mediated base editing can be used to facilitate a permanent and stably inherited cytosine-to-thymine nucleotide correction of the rs1061170 SNP in the CFH gene with minimal off-target effects.
Age-related macular degeneration (AMD) is a leading cause of late-onset central vision loss affecting individuals over the age of 50 years [1]. The condition has a substantial global burden and is expected to affect up to 288 million people by the year 2040 [2]. It involves environmental, genetic, and physiologic determinants [3] that cause damage to the macula, which is an area within the eye required for sharp, central vision. Here, the retinal pigment epithelium (RPE) is important for maintaining retinal homeostasis by providing nutrients and ionic support to the apical photoreceptors while facilitating the phagocytic removal of waste products and the exchange of biomolecules with the fenestrated choroidal capillaries [4]. The basal deposition of drusen, which can occur at either the RPE or Bruch’s membrane, can significantly affect photoreceptor health [5]. This has been previously thought to occur due to the dysregulation of the alternative complement pathway, which causes inflammation of the RPE. The dysregulation of the alternative complement pathway has been shown to be associated with the extensively characterized single nucleotide polymorphism (SNP), rs1061170, in the Complement Factor H (CFH) gene [6].
The rs1061170 variant is characterized by a cytosine nucleotide (C) in place of a thymine nucleotide (T), which results in a tyrosine to histidine amino acid residue substitution at position 402 (Y402H) in the CFH protein. The incidence of the disease associated variant at this locus results in a 3.3- to 7.7-fold increase in AMD progression for Caucasians homozygous for the risk variant allele, and a 2.2- to 4.6-fold increase for heterozygous individuals [6]. The amino acid residue substitution principally affects the CFH protein, which is a serum glycoprotein responsible for attenuating the response of the alternative complement pathway by cleaving pro-inflammatory mediators [7, 8]. The Y402H allelic variant affects the hydrogen bonding capacity of the CFH protein and impairs its ability to dock to the appropriate receptor on the RPE or Bruch’s membrane [9, 10]. This stereochemical impairment results in an increase in the fraction of uncleaved pro-inflammatory cytokines and causes inflammation of the RPE.
The discovery of an adaptive prokaryotic immune system that requires the use of only a single endonuclease (Cas9) to mediate a DNA double-strand break (DSB) [11], allowed for the development of a programmable nuclease that could target DNA in a site-specific manner [12]. Here, the single-guide RNA (sgRNA) component of the CRISPR/Cas9 system facilitates targeted DNA editing at a user-defined nucleotide sequence [12, 13]. The binding between the Cas9 protein and the target DNA strand is dictated by its protospacer-adjacent motif (PAM) site, which is a short consensus sequence of nucleotides that occur adjacent to the target DNA sequence [12, 14].
Herein, we describe the application of a modified Clustered Regularly Interspersed Short Palindromic Repeats/CRISPR associated protein 9 (CRISPR/Cas9) system, known as ‘base editing’ [15], to mediate a precise single nucleotide change targeting the AMD-associated risk allele at the rs1061170 locus. Base editing mitigates the previously characterized risks associated with introducing DSBs into the genome, which can lead to imprecise insertions and deletions (indels) [16, 17]. Further, base editing has a dramatically higher editing efficiency compared to homology-directed repair (HDR), a process that has a notoriously low efficiency in somatic cells due to the competing DNA repair pathway of non-homologous end joining (NHEJ), where nucleotide insertions and deletions are typically generated [18, 19].
The CRISPR-mediated base editor fuses a cytidine deaminase, such as APOBEC3 [20], APOBEC1 [15], or human-AID (activation-induced cytidine deaminase) [21, 22], to a modified Cas9 protein to allow for site-directed cytosine-to-thymine nucleotide corrections [23]. Its nickase mechanism and the fusion of a uracil DNA glycosylase inhibitor (UGI) allows the introduction of a nucleotide correction in a stably inherited manner in both dividing and non-dividing cells [12, 24]. Although there are several variants of base editors currently available [20, 25], we selected and screened for the most well characterized variants of base editors. CRISPR/Cas base editors targeting the disease-relevant locus were screened due to their varying PAM requirements, sgRNA target sequence, and anticipated activity window. As such, base editing variants with the same mutagenicity profiles were selected to observe if the sequence-specific context of the locus has any bearing on the efficacy of the targeted correction (BE3, SaKKH-BE3, SaBE3, and VQR-BE3) [25]. Here, the PAM variant base editors (SaKKH-BE3, SaBE3, and VQR-BE3) share the same structural arrangement of BE3 in terms of the choice of the deaminating protein (APOBEC3), linker length, and terminal protein fusions, but markedly differ in the range of PAM targetable DNA substrates given their variations in the Cas9 protein component. Further, to profile the effects of the activity window on correcting the rs1061170 SNP, a base editor with a different architecture and activity window (Target-AID) was evaluated [26]. While the respective PmCDA1 and APOBEC3 proteins of Target-AID and BE3 both fulfill a deaminating role, each protein possesses a differing mutagenicity profile and processive capability [27-30]. These differences in protein function require much-needed consideration and clarification for use in the mostly cytosine-deplete sequence space surrounding the rs1061170 SNP. After the initial screening of base editors at the Y402H CFH locus, we selected the most active base editor for further off-target assays to explore its therapeutic potential.
To evaluate the efficacy of base editing to correct the AMD-associated rs1061170 variant, a commercial human embryonic kidney cell line (HEK293A; Invitrogen, CA) was engineered to contain the pathogenic risk variant by using a lentiviral method to knock-in a gene fragment containing exon 9 of the CFH gene. Briefly, the pLenti.AS2.Luci.puro vector (RNAiCore; Academia Sinica, Taipei, Taiwan) was digested using NheI and EcoRI (New England Biolabs, MA) to insert the synthetic CFH gene fragment (gBlock gene fragment, Integrated DNA Technologies, IA). Cell line authentication was performed using short tandem repeat DNA profiling with the following markers: AMEL, CSF1PO, D5S818, D7S820, D13S317, D16S539, D21S11, TH01, TPOX, and vWA through the Australian Genome Research Facility LTD, VIC, Australia (Appendix 1, Appendix 2, and Appendix 3). Cell line transduction and ongoing cell culture conditions were followed as previously described [31].
CHOPCHOP was used to evaluate the incidence of internal RNA interactions between the protospacer portion of the sgRNA and the sgRNA scaffold backbone [32]. The webtool RNAfold [33] was used to investigate the folding spontaneity of the unmodified and modified SpCas9 sgRNA scaffolds. The following scaffold modifications were considered: a canonical A54:U60 substitution in the critical stem loop one region and a modified stem loop two and three linker region, and an A:U nucleobase pair flip at U25 in the repeat:anti-repeat duplex of the sgRNA scaffold [34] and an extension of the repeat:anti-repeat duplex (F-E modified scaffold; Appendix 2). Each modified scaffold contained a 5′ spacer portion specific for rs1061170 and a GGG PAM site, and a 3′ scaffold backbone (sgRNA2 and sgRNA2G, Appendix 2). The modified scaffolds were prepared according to the gBlocks® Gene Fragments (Integrated DNA Technologies, IA) protocols as described above.
The Target-AID construct was assembled using the HiFi DNA Assembly method (New England Biolabs, MA) where the Target-AID was subcloned from the pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA (HPRT) plasmid (5,365 base pairs; Addgene plasmid #79620); the P2A-fragment and Blasticidin resistance gene were amplified from the LentiCas9-BLAST plasmid (Addgene Plasmid #52962); the vector backbone was amplified from pCMV-BE3 (3,361 base pairs; Addgene plasmid #73021). The individual PCR fragments of the construct were amplified using the Q5 High-Fidelity DNA polymerase (New England Biolabs) and purified using the QIAquick® PCR Purification Kit (QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. NEB® Stable Competent E. coli (New England Biolabs) was used for transformation following standard transformation protocol. Plasmid DNA was harvested using the Wizard® Plus SV Minipreps DNA Purification Systems (Promega Corporation, WI) according to the manufacturer’s instructions.
For the U6-promoter driven expression of sgRNAs for the SaCas9 PAM variants, SaBE3 (Addgene Plasmid #85169) and SaKKH-BE3 (Addgene Plasmid #85170), CFH-locus specific sgRNAs (Appendix 2) were annealed and cloned into the digested px552-CMV-U6-SaCas9 (Addgene Plasmid #107053) construct at the SapI site. Whereas the cloning of SpCas9-specific sgRNA for the SpCas9 PAM variant, VQR-BE3 (Addgene Plasmid #85171) and sgRNA oligos (Appendix 2) were annealed and cloned into the px552-CMV-mCherry-U6-SpCas9 sgRNA scaffold (Addgene Plasmid #60958). Constructs were transformed into NEB® Stable Competent E. coli (New England Biolabs) and plasmid DNA was harvested using the Wizard® Plus SV Minipreps DNA Purification Systems (Promega).
HEK293A-CFH(p.His402) fragment cells were plated to a final density of 75,000 cells per well in a 12-well plate. After one day of incubation at 37 °C and 5% CO2, the confluency of the cells was checked to ensure it was between 40% and 70%. Then, 500 ul of 10% fetal bovine serum (FBS) media (Thermo Fisher Scientific™, MA) was replaced and the DNA:nanoparticle complex was prepared following the manufacturer's protocol. For each condition, an aliquot containing 450 ng of sgRNA scaffold (gBlocks® Gene Fragments) or 1 µg of px552-CMV-mCherry-U6-SaCas9 or SpCas9 sgRNA scaffold (Appendix 2), and 1 µg of the CRISPR/Cas9 plasmid were prepared in 46 µl of Opti-MEM (Thermo Fisher Scientific™). Then, 3.5 µl of FuGENE® HD (Promega) was added, followed by pulse spin and vortex. The reaction aliquot was then incubated for 10 min at room temperature before adding it in a dropwise manner to each well of the 12-well plate. The HEK293A-CFH(p.His402) fragment cells were incubated at 37 °C for three days.
Following transfection of the HEK293A-CFH(p.His402) fragment cells with the CRISPR and sgRNA constructs, 20 µg/ml of blasticidin was added one day after transfection for a six-day (blasticidin) selection of the cells. Genomic DNA was harvested using the QIAamp DNA minikit (QIAGEN) from enriched cells, six days after transfection. Amplification of the exogenous CFH fragment tag was performed using KOD DNA polymerase (Merck Millipore, Darmstadt, Germany) with the Myc and FLAG tag primers (Appendix 2) and PCR products were purified using the Wizard® SV Gel and PCR Clean-Up System (Promega, WI). PCR samples were sent for Sanger sequencing with the Myc tag primer.
Nested PCR amplification reactions were performed on the extracted gel-purified PCR amplicons for the on-target base edited cells following transfection using custom primers annealing to the CFH gene fragment and a compatible overhang for Illumina forward and reverse adapters (Appendix 2). For the off-target assays, the top 20 scoring sgRNAs were selected using CHOPCHOP [32]. The sgRNAs were then further evaluated using the following criteria: a cytosine occurs within a −1 to +5 nucleotide position within the 20 nucleotide protospacer, whereby the PAM consensus sequence is in position +21 to +23; a compatible NGG-style PAM site must flank the 20-nucleotide protospacer; the sgRNA protospacer for the off-target site must contain no more than three mismatches compared to the on-target sgRNA sequence. Briefly, using standard conditions for Q5 High-Fidelity DNA polymerase (New England Biolabs), the on-target PCR amplicons were amplified using the following thermocycling conditions: initial denaturation at 98 °C for 30 s, cycling at 98 °C for 10 s, 60 °C for 10 s, 72 °C for 15 s, and a final extension of 72 °C for 2 min. Off-target sites were PCR-amplified using the extracted gDNA sample directly after transfection on both the LacZ-treated negative control and on the base editor-treated gDNA samples using the previously described thermocycling conditions. A secondary PCR reaction was performed on the resultant PCR amplicons following validation of the correct size using the gel purification method as previously described. Briefly, the Nextera™ DNA CD Indexes (24 Indexes, 24 Samples; Illumina, Inc., San Diego, CA) was used to barcode the previously tagged PCR amplicons containing the Illumina adaptor overhangs. The LacZ-negative control treated or base-edited samples each have, respectively, one on-target PCR amplicon and 11 off-target PCR amplicons. The base-edited or LacZ-negative control PCR amplicons were each barcoded individually according to a secondary PCR reaction (Appendix 3) [35]. The resultant PCR amplicon library was then quantified using Qubit measurement and normalized to 4 nM before being pooled into a single tube for next generation deep sequencing using the MiSeq Reagent Kits v2 (500-cycles; Illumina, Inc. CA). The matlab script, which was adapted from Gaudelli and colleagues [35], was used for determining the incidence of cytosine-to-thymine nucleotide changes in either the on-target treated sample or the off-target sites, relative to the LacZ negative control (Appendix 3).
Base editor constructs (BE3, Target-AID, SaBE3, SaKKH-BE3, and VQR-BE3) were co-transfected with their respective sgRNA expressing cassettes targeting the rs1061170 locus in the HEK293A-CFH(p.His402) fragment cell line (Figure 1). No base editing was observed on direct Sanger sequencing with the BE3, SaBE3, SaKKH-BE3, or VQR-BE3 constructs, whereas appreciable cytosine-to-thymine editing was observed with the Target-AID construct (Figure 2A; n = 3).
Next-generation deep sequencing was performed on the Target-AID base-edited cells. The median number of reads was 53,631 across the on-target and putative off-target loci (ranging from 569 to 430,308 reads). The overall editing efficiency showed that a cytosine-to-thymine nucleotide conversion occurred in 21.5% of the total sequencing reads at the target risk allele (pooled amplicons, n = 3). Nucleotide product purity for the base edited locus also revealed no unexpected nucleotide transitions or transversions beyond that of the intended C:G to T:A nucleotide transition (Figure 2B). The target protospacer revealed a minimal incidence of a G:C to A:T nucleotide transition at the G8 nucleotide. Further, the incidence of indel formation at the target region was also evaluated and revealed an indel formation frequency of only 0.15%.
To investigate the incidence of unintended off-target effects, the top 11 putative off-target sites were profiled. Off-target sites were selected based on the criteria of one-to-three nucleotide mismatches occurring within the sgRNA protospacer relative to the target locus sgRNA, and the incidence of at least one cytosine nucleotide occurring within the expected −1 to 5 nucleotide window of the sequence space flanked by an NGG-compatible PAM consensus sequence. Next-generation deep sequencing showed no detectable off-target effects at any of the nucleotide positions within the profiled protospacer sequence and their surrounding regions (Figure 2C).
Using the HEK293A-CFH(p.His402) fragment cells, we also investigated the utility of a SpCas9 scaffold enhancement. Cotransfection of a plasmid expressing SpCas9 and a sgRNA targeting the rs1061170 SNP demonstrated clear evidence of indel formation occurring three nucleotides upstream from the GGG PAM site (n = 3). Incidence of the neighboring AGG PAM motif (n = 3, data not shown) showed poor indel formation at this nucleotide sequence, suggesting poor Cas9:sgRNA localization. In an attempt to further optimize the editing efficiency of the CRISPR/Cas9 platform, the sgRNA component was altered with key modifications aimed at increasing the stability of Cas9:sgRNA complexation. We evaluated the effects of reducing the linker length between critical stem loop regions of the sgRNA scaffold (stem loops one and two), extending the repeat:anti-repeat duplex of the scaffold, introducing key nucleotide substitutions in the sequence space following the sgRNA protospacer to remove a putative transcription stop signal from within the sgRNA scaffold, and introducing a stabilizing nucleotide substitution into stem loop one to reduce the incidence of strong interactions adversely affecting sgRNA folding. Despite these modifications, the overall effects of modifying the sgRNA scaffold were negligible when compared to either the original sgRNA scaffold or the modified variant with the extended repeat:anti-repeat duplex. Overall cutting efficiency indicating Cas9:sgRNA localization did not show further improvements than what had already been observed (Appendix 2).
This proof-of-concept work demonstrates the feasibility of targeting the AMD-associated high Y402H CFH risk-variant cytosine by facilitating a cytosine-to-thymine (or guanine-to-adenine) nucleotide transition. We observed that base editing occurs in a remarkably precise manner in which only a single nucleotide correction is seen at the target cytosine, with no significant off-target deamination event occurring outside the sequence space of rs1061170. Given the characteristically low incidence of cytosines occurring within the proximal sequence space, the resultant deamination event is highly localized. Additionally, there were virtually no detectable off-target deamination events observed at the top eleven off-target sites and no appreciable indel formation was observed despite the use of a nickase variant of SpCas9.
Our work extensively explored the immediate sequence space of the rs1061170 SNP by evaluating the incidence of base editing with a plethora of PAM-diverse base editor variants and their respective sgRNAs. It is tempting to consider the four-to-five nucleotide activity window of the base editor as generous when the target is only a single nucleotide. Nonetheless, the task of selecting the most appropriate base editor can be frustrated by the spatially exhaustive exercise of re-orientating the sgRNA, the sense strand, and the antisense strand. For example, additional parameters require careful consideration, such as the restrictive PAM site requirements, the apparent activity bias of some sgRNAs sequences, and the incidence of unintended amino acid residue substitutions in the surrounding sequence space. Here, base editing was only observed with the pairing between the Target-AID base editor construct and its respective sgRNA. No base editing was observed with other constructs despite the target cytosine occurring in several putative base editor activity windows (Figure 1). This finding is likely attributable to the fact that the optimal activity window varies across the putative window and is specific to each protospacer sequence and base editor construct [36]. For example, in a 20-nucleotide spacer, no run-off base editing at the C1 position was observed from the activity of BE3, which has a maximal activity window positioned between four and eight nucleotides wherein the PAM consensus sequence is considered to be positions 21 to 23 nucleotides. This observation strongly highlights the importance of overlapping the suggested, putative activity window of the base editor construct with its target cytosine. Although, the use of other PAM-relaxed Cas9 orthologs, such as the recently developed xCas9 or SpCas9-NG, may appear attractive, it was noted that a significant proportion of the introduced mutations, whether through random mutagenesis in the case of xCas9 or the rational design for SpCas9-NG, appeared to affect the on-target base editing efficiency of these variants at the NGG PAM sites [37, 38]. Nonetheless, other factors such as the sequence-specific nucleotide context of each sgRNA may also be a prevailing factor in determining base editing efficiency.
We evaluated two neighboring SpCas9 sgRNAs corresponding to a GGG and an AGG PAM sequence (Figure 1) and found poorer indel formation with the AGG (n = 3, data not shown) PAM consensus sequence relative to the GGG PAM motif, which was an observation that mirrored the apparent partiality of some nucleotide sequences for Cas9:sgRNA localization and activation [39, 40]. Chadwick and colleagues tested over 30 different sgRNAs but found that only 12 were amenable to base editing at their assayed locus [41]. Although we attempted to address the potential incidence of misfolded sgRNA scaffolds by introducing key mutations aimed at reducing internal RNA interactions and interruptions to structurally significant motifs within the scaffold, we observed no further increase in Cas9 activity than what was already evident with either the original scaffold or the F-E modified variant [42]. Prior to evaluating the incidence of base editing, each sgRNA sequence was ranked algorithmically and their scores suggested appreciable activity at the suggested sites, which we initially confirmed by observing the frequency of indel formation at the loci [40, 43].
Future directions for this work would be to evaluate the changes in the serum levels of oxidative stress biomarkers in an in vivo system with a risk-variant encoded CFH protein profile against that of the corrected variant. Promising human-chimeric mice models have shown that some functional, photoreceptor rescue was possible when the risk variant allele was replaced with its non-risk variant form [44]. Other mouse models have further elucidated the in vivo mechanistic consequences of the Y402H mutation and its role in increasing the serum biomarkers for oxidative stress [44, 45]. Therefore, it would be reasonable to suggest that the use of the Target-AID base editor with its respective sgRNA could provide scope for measuring the in vivo recovery of photoreceptor activity.
In conclusion, we demonstrated the feasibility of using a cutting-free and potentially indel-free approach toward facilitating gene-based anticipatory therapy for a common high-risk variant that is found to be strongly associated with AMD. Appreciable base editing efficiencies were noted at the target loci in a highly precise manner with virtually no observed off-target events or indel formation. Here, the feasibility of base editing on one of the most significant risk factors for AMD progression has significant implications for the development of a potential, single-dose injectable therapy targeting dysregulated RPE cells and reducing the systemic contribution of cleavage-incompetent CFH proteins produced by the liver [49], if administered to the liver and the RPE using DNA-free approaches [50].
This work was supported by the Ophthalmic Research Institute of Australia and the Macular Disease Foundation of Australia. Financial support was also obtained from an Australian National Health and Medical Research Council (NHMRC) Centres of Research Excellence (CRE) #1023911 and Project Grant (APP1123329). JEC and AWH are supported by NHMRC Fellowships, while AP is supported by an ARC Future Fellowship. The Centre for Eye Research Australia (CERA) receives Operational Infrastructure Support from the Victorian Government. The contents of the published material are solely the responsibility of the Administering Institution, a Participating Institution or individual authors and do not reflect the views of the NHMRC. We gratefully thank Vikrant Singh for helping with the matlab scripts for base calling and indel formation.