Molecular Vision 2016; 22:1062-1076
Received 31 March 2016 | Accepted 27 August 2016 | Published 29 August 2016
Rebecca J. Sardell,1 Jessica N Cooke Bailey,2 Monique D. Courtenay,1 Patrice Whitehead,1 Reneé A. Laux,2 Larry D. Adams,1 Jorge A. Fortun,3 Milam A. Brantley Jr.,4 Jaclyn L. Kovach,3 Stephen G. Schwartz,3 Anita Agarwal,4 William K. Scott,1 Jonathan L. Haines,2 Margaret A. Pericak-Vance1
1John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL; 2Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH; 3Bascom Palmer Eye Institute, University of Miami Miller School of Medicine, Miami, FL; 4Department of Ophthalmology and Visual Sciences, Vanderbilt University School of Medicine, Nashville, TN
Correspondence to: Margaret A. Pericak-Vance, John P. Hussman Institute for Human Genomics University of Miami Miller School of Medicine, 1501 NW 10th Avenue, BRB 319, Miami, FL 33136; Phone: (305) 243- 2308: FAX: (305) 243-2396; MPericak@med.miami.edu
Purpose: Demographic, environmental, and genetic risk factors for age-related macular degeneration (AMD) have been identified; however, a substantial portion of the variance in AMD disease risk and heritability remains unexplained. To identify AMD risk variants and generate hypotheses for future studies, we performed whole exome sequencing for 75 individuals whose phenotype was not well predicted by their genotype at known risk loci. We hypothesized that these phenotypically extreme individuals were more likely to carry rare risk or protective variants with large effect sizes.
Methods: A genetic risk score was calculated in a case–control set of 864 individuals (467 AMD cases, 397 controls) based on 19 common (≥1% minor allele frequency, MAF) single nucleotide variants previously associated with the risk of advanced AMD in a large meta-analysis of advanced cases and controls. We then selected for sequencing 39 cases with bilateral choroidal neovascularization with the lowest genetic risk scores to detect risk variants and 36 unaffected controls with the highest genetic risk score to detect protective variants. After minimizing the influence of 19 common genetic risk loci on case-control status, we targeted single variants of large effect and the aggregate effect of weaker variants within genes and pathways. Single variant tests were conducted on all variants, while gene-based and pathway analyses were conducted on three subsets of data: 1) rare (≤1% MAF in the European population) stop, splice, or damaging missense variants, 2) all rare variants, and 3) all variants. All analyses controlled for the effects of age and sex.
Results: No variant, gene, or pathway outside regions known to be associated with risk for advanced AMD reached genome-wide significance. However, we identified several variants with substantial differences in allele frequency between cases and controls with strong additive effects on affection status after controlling for age and sex. Protective effects trending toward significance were detected at two loci identified in single-variant analyses: an intronic variant in FBLN7 (the gene encoding fibulin 7) and at three variants near pyridoxal (pyridoxine, vitamin B6) kinase (PDXK). Aggregate rare-variant analyses suggested evidence for association at ASRGL1, a gene previously linked to photoreceptor cell death, and at BSDC1. In known AMD loci we also identified 29 novel or rare damaging missense or stop/splice variants in our sample of cases and controls.
Conclusions: Identified variants and genes may highlight regions important in the pathogenesis of AMD and are key targets for replication.
The neurodegenerative eye disease age-related macular degeneration (AMD) is a leading cause of loss of vision in the United States  and the third largest cause of blindness worldwide . The disease is characterized by progressive damage to the RPE, which results in photoreceptor cell death and ultimately leads to the largely irreversible loss of central vision. In early stages, pigmentary changes occur, and extracellular deposits (drusen) in Bruch’s membrane accumulate, increasing in size and number during intermediate stages, until in advanced AMD there is RPE cell death (geographic atrophy, GA) and/or abnormal blood vessel growth under the retina (choroidal neovascularization, CNV). Although these disease types are the common categorizations of AMD, there is wide phenotypic variation across these disease types. Current treatment options for AMD are limited. CNV is typically treated with regular intraocular injections of antiangiogenic agents targeting the blood vessel growth factor vascular endothelial growth factor A (VEGFA); despite this treatment option, vision improvement is not typical . There is no effective treatment for GA.
Risk of advanced AMD strongly increases with age, with population-wide prevalence reaching more than 10% in individuals of European ancestry older than 80 years [4,5]. As the aging population continues to grow, so, too, does the number of individuals with AMD, thus presenting a large population of individuals with a disease for which treatment options are limited. In addition to advanced age, environmental risk factors for AMD include smoking history, sex, body mass index, diet, and European ancestry [6-8]. There is also a large genetic component to AMD disease risk; a twin study estimated the heritability of advanced AMD to be 71% .
Identified common genetic variants that alter the risk of developing advanced AMD consist of several loci of large effect, including complement factor H (CFH; OMIM 134370) [10-14], complement component 2/complement factor B (C2/CFB, OMIM 613927, 138470) [15-17], age-related maculopathy susceptibility 2/HtrA serine peptidase 1 (ARMS2/HTRA1; OMIM 611313, 602194) [18-21], complement component 3 (C3, OMIM 120700) [22-24], and apolipoprotein E (ApoE, OMIM 107741) [25-28]. Common variants of smaller effect [29-32] and rare variants of large effect [33,34] have also been identified. A genome-wide association study (GWAS) of >17,000 advanced AMD (GA and/or CNV) cases and >60,000 controls by the International AMD Genomics Consortium (IAMDGC) identified 19 common (minor allele frequency, MAF >0.01) single nucleotide risk variants (SNVs) . Together, these 19 SNVs define a genetic risk score that explains 73% of the area under a receiver-operating characteristic curve, therefore distinguishing well between advanced AMD cases and controls . More recently, a genome-wide analysis that focused on exonic variation from >16,000 advanced cases and >17,000 controls by the IAMDGC identified 16 additional variants that reached genome-wide significance, with a total of 52 common and rare variants in 34 genes that were independently associated with the risk of advanced AMD .
Despite known variants explaining a high proportion of the heritability of AMD disease risk (40–60%) compared with many other complex human diseases, a substantial portion remains unexplained [36,37]. Although the genetic architecture of complex human diseases remains largely uncertain, the remaining heritability in AMD disease risk may reflect a combination of rare variants, structural variation, non-additive genetic variance, and gene–environment interactions . We specifically focus herein on rare variation.
Recent developments in whole exome sequencing technology provide the opportunity to specifically test the effects of rare variants. Such variants are rare at the individual level but can occur within and/or affect the same gene and thus are actually a somewhat common occurrence; rare variants may explain a large portion of the remaining heritability in disease risk [38,39]. Exome sequencing may also enable the causal variants to be identified, instead of implicating common non-coding markers in strong linkage disequilibrium with causative variants as often happens in GWAS [38,40]. Despite these advantages, exome sequencing remains expensive, limiting sample size and thus power. The power of rare variant studies is further limited by the low frequency of rare SNVs and the potential for rare variants to exhibit allelic heterogeneity requiring analytical approaches that combine information from multiple variants into gene-based or pathway-based tests [40-43]. One approach that attempts to maximize the power to detect rare risk or protective variants of complex diseases given limited sample size is to select for analysis individuals with extreme liability scores, that is, individuals at the phenotypic extremes that are not well predicted by their genotype (at known risk loci) or known environmental risk factors [41,44-46].
In the discovery phase, we performed whole exome sequencing for 75 phenotypically extreme individuals: 36 individuals who had not developed macular changes associated with early AMD despite being well over the age of onset (60 years of age) and having a high genetic risk score based on 19 common risk SNVs and 39 individuals who developed CNV in both eyes despite a low genetic risk score. By minimizing the effect of 19 known AMD risk variants and selecting individuals at the phenotypic extremes, our approach targeted the detection of rare variants of large effect and the aggregate effect of weaker variants within genes or pathways to gain insight into the underlying genetics of AMD and generate hypotheses for future studies of AMD risk.
Subjects of European ancestry (1738 females and 1099 males of mean age=76±SD 8.8) were recruited from the Duke University Eye Center (DUEC, n=1081), the Vanderbilt Eye Institute (VEI, n=264) and the University of Miami Bascom Palmer Eye Institute (BPEI, n=1492). Written consent was obtained from all participants, all procedures conformed to the Declaration of Helsinki, and research was performed under protocols approved by the Institutional Review Board at each participating institution. Participants were examined by a board-certified retina specialist using slit-lamp biomicroscopy and dilated fundoscopy. Genetic analysis was conducted after physical examination; physicians were therefore blinded to genetic results. Fundus photos were graded using a modified Age-Related Disease Severity (AREDS) scale (the Clinical Age-Related Maculopathy Staging, CARMS, system) that ranges from Grade 1 for controls to Grade 5 for CNV cases [6,47].
DNA was extracted from whole blood using standardized protocols. Blood was drawn via venipuncture and DNA extraction was extracted from whole blood using the Pure Gene method and standard protocols  and frozen at –80 ºC until use. Genotyping was performed at Vanderbilt University’s Center for Human Genetics Research. The Sequenom MassARRAY genotyping platform (Sequenom, San Diego, CA) was used to genotype samples at 19 common (MAF>1%) SNVs significantly associated with the risk of advanced AMD in a GWAS meta-analysis (Appendix 1) .
Genotypes at the 19 risk variants were used to calculate a cumulative genetic risk score for each individual. For each of the 19 loci, the number of risk alleles (0, 1, or 2) was multiplied by the variant-specific weight calculated as the odds ratio from the GWAS meta-analysis divided by the sum of odds ratios across all 19 loci . The risk score for each of the 19 variants was then summed to give a cumulative risk score for each individual :
Genetic risk score = w1 × genotype1 + w2 × genotype2 + … w19 × genotype19.
Extreme cases: we sequenced 39 cases with a genetic risk score at least 1 standard deviation (SD) lower than the mean risk score of all bilateral CNV cases (n=467, mean score =1.31±SD 0.21; Figure 1A; Table 1); i.e. with a risk score ≤1.10. Extreme controls: we sequenced 36 controls with a genetic risk score at least 1 SD higher than the mean risk score of all bilateral grade 1 controls with no (or very few small) drusen that were older than 66 years of age (n=397, mean score=1.05±0.20; Figure 1B; Table 1); i.e. with a risk score ≥1.25.
Exome sequencing of all 75 phenotypically extreme individuals was performed at the Center for Genome Technology at the Hussman Institute for Human Genomics. Sequencing was performed using the Agilent SureSelect Human All Exon V4 (n = 56) or V5 (n = 19) 50 Mb capture that included untranslated regions (UTRs; Agilent, Santa Clara, CA). The two versions overlapped at 102,556,040 positions. Base calling used the Illumina CASAVA 1.6 pipeline, and calls were aligned to the human reference genome (hg19) using Burrows-Wheeler Aligner (BWA) software . SNVs and insertion-deletion variants (indels) were called by GATK UnifiedGenotyper with variant quality score recalibration (VQSR).
VCFtools  was used to filter variants based on quality. Variants with genotype quality (GQ) <30, depth (DP) <8, or Phred-scaled likelihood of reference genotypes (PL) <99 were excluded. VQSLOD recalibration was run using an additional 25 exome sequenced samples, and positions with scores < −3 were removed. PLINK v1.07  was used to remove any loci for which genotypes were missing in >10% of individuals (cases and controls) and any common variants (MAF >0.01 in the European population) that were not in Hardy–Weinberg equilibrium (HWE) in the controls (p<0.001; n = 231). After these quality control steps, 295,814 biallelic autosomal SNVs were available for analysis.
The sequencing data were checked for concordance with GWAS data from an Affymetrix 6.0 chip (Santa Clara, CA) where available (for details see ); all samples (n = 63) had >0.97 concordance across 15,850 SNVs that were available across both data sets. Relatedness was checked using PLINK by calculating pairwise identity by descent to confirm that all samples were unrelated at π ̂̂<0.1. Ethnicity was checked using EIGENSTRAT  by selecting a subset of SNVs with MAF>20% and pruning based on their genotypic correlation to give a final sample of 9,206 SNVs using PLINK. Principal component loadings were then compared with those generated from 1,000 HapMap samples of known ethnicity to confirm that all sequenced individuals clustered with European samples. Wilcoxon tests showed that none of the top ten principal components varied significantly with case–control status. Self-reported sex was confirmed using that estimated from heterozygous X-linked variants in PLINK.
Variants were annotated with their frequency in the European population using the NHLBI Exome Sequencing Project’s Exome Variant Server (ESP) , Exome Aggregation Consortium (ExAC), and 1000 Genomes databases using ANNOVAR . Variants that were not present in the 1000 Genomes (Oct2014 version) database, the 6,500 ESP SI-V2 individuals, ExAC version 0.3, or dbSNP v138 were considered novel. SeattleSeq v.138 was used to identify the gene(s) in which variants were located. Variants were annotated for region (exonic, intronic, UTR, splicing, etc.) and exonic function (synonymous, nonsynonymous, stopgain, stoploss) by reference to refSeq and annotated for predicted impact by reference to PolyPhen-2 (benign, possibly damaging, probably damaging) and SIFT (tolerated or damaging) using ANNOVAR.
A single variant test of association was run on all autosomal variants using Firth logistic regression controlling for age and sex and implemented in EPACTS . Firth regression is recommended for association tests with small sample sizes and rare variants. Because single variant tests have low power to detect rare variants, especially for small sample sizes, we also combined exonic variants for gene-based tests of association  and combined genes for pathway analyses. Gene and pathway analyses were run on three subsets of data: 1) rare (MAF ≤1%) functional SNVs that were predicted to be damaging by Polyphen2 and SIFT or specified as stop or splice variants by RefSeq, 2) all rare (MAF ≤1%) SNVs, and 3) all autosomal SNVs. For 1) and 2), variants were excluded if their MAF in the EVS, 1000 Genomes, or ExAC database for Europeans was >1%, thus including all novel variants and those with MAF ≤1% in the broader population.
Gene-based tests were conducted using sequence kernel association tests, specifying the optimal adjust option (SKAT-O) and small sample size adjustment . SKAT-O computes either a burden test, which assumes all alleles influence the association in the same direction, or a SKAT (variance-component) test that accounts for risk and protective alleles to maximize power. We controlled for the known risk factors of age and sex by including these variables as main effects in the SKAT null model. Finally, pathway analyses were run using WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) [59,60] that compares a list of gene sets that reached nominal significance (p<0.05) in the gene-based SKAT test with a complete list of those analyzed to identify enriched pathways. The Kyoto Encyclopedia of Genes and Genomes (KEGG) was used to identify pathways associated with the genes of interest. Bonferroni correction was used to define genome-wide significance levels for single variant and SKAT analyses, and the Benjamini & Hochberg correction (false discovery rate, p<0.05) was used to define significance for the pathway analyses. Quantile-quantile plots of p values from single variant and gene-based tests were generated to identify any inflation of type I error due to an undetected batch effect or population substructure.
After quality control, we identified 295,814 autosomal positions that were variant in at least one of the 75 sequenced individuals, of which 110,925 were rare (MAF≤0.01) or unreported in the wider European population. A total of 8,278 of these rare or novel positions were predicted to be damaging missense or stop or splice variant sites.
Considering all 24,531 novel autosomal SNVs (those not present in any population in the 1000 Genomes, ESP 6500, ExAC, or dbSNP database), 13,045 were found only in cases and 10,930 only in controls. Of the 841 novel variants that occurred in more than one individual, 157 variants occurred only in cases and 128 variants only in controls (Appendix 2), of which one was predicted to be a damaging nonsynonymous exonic variant: a variant at position 22:29885637 in the neurofilament heavy polypeptide (NEFH, OMIM 162230) gene that occurred in two cases (Appendix 2). The number of observed p values of less than 0.05 in the single variant and gene-based tests did not exceed that expected by chance, suggesting the absence of a population sub-structure related to case–control status (i.e., λ<1; Appendix 3).
Numerous variants differed substantially in frequency between the phenotypically extreme cases and controls suggesting either strong risk or protective effects on AMD, although none reached genome-wide significance outside regions known to be associated with the risk of advanced AMD (p<1.7 × 10−7; Table 2 and Table 3). The most strongly associated single variants outside known AMD loci were rs4849050 in fibulin-7 (FBLN7, OMIM 611551; p = 1.74 × 10−6) and three variants nearby pyridoxal (pyridoxine, vitamin B6) kinase (PDXK, OMIM 179020; p<4.65 × 10−5; Table 2); all were found more frequently in controls suggesting potentially protective effects on AMD risk. Ranking variants by effect size (β coefficient from logistic regression) also highlighted several variants that had strong risk or protective effects (Appendix 4), including rs141133909 in Mab-21 domain containing 1 (MB21D1, OMIM 613973; β = 3.93) and rs117615621 in MYB binding protein (P160) 1a (MYBBP1A, OMIM 604885; β = –3.78), which were predicted to be damaging. Also of note was rs113889337 in pleckstrin homology domain containing, family B (evectins) member 1 (PLEKHB1, OMIM 607651; β = 4.21), a gene that is expressed in photoreceptor cells (Appendix 4).
Considering only exonic variants that were predicted to be damaging, the most strongly associated variants were rs2294066 in myomesin 2 (MYOM2, OMIM 603509; p = 1.32 × 10−4), rs440160 in tenascin XB (TNXB, OMIM 600985; p = 1.36 × 10−3), and rs61739424 in collagen type XXIII alpha 1 (COL23A1, OMIM 610043; p = 2.28 × 10–3;Table 3) that were all more frequent in the extreme AMD cases than in the controls. Across the 35 loci associated with the risk of advanced AMD in two meta-analyses [35,36], we also identified 29 rare or novel damaging missense or stop or splice variants in the sample of phenotypically extreme cases and controls, of which three were not found in the dbSNP, 1000 Genomes, ExAC, or EVS6500 database (Appendix 5).
Although SKAT did not identify any gene that reached genome-wide significance (p≤2.5 × 10−6; Table 4) in any of the three subsets, we identified several genes that may be targets for follow-up analysis. In the analysis of all variants, the most significant gene regions (p<1 × 10−4) were asparaginase like 1 (ASRGL1, OMIM 609212; p = 7.52 × 10−5) and BSD domain containing 1 (BSDC1; p = 9.54 × 10−5). In the analyses of all rare variants and rare damaging missense and stop or splice variants, no gene regions had p values of less than 1 × 10−4 (Table 4).
In the analysis of rare stop or splice and damaging missense variants, the top pathways were the sphingolipid metabolism (empirical p, pe = 0.002; adjusted p, pa = 0.15), gap junction (pe = 0.033, pa = 0.70), caffeine metabolism (pe = 0.038, pa = 0.70), and biosynthesis of unsaturated fatty acids pathways (pe = 0.038, pa = 0.70). In the analysis of all rare variants, the aminoacyl-tRNA biosynthesis (pe = 0.01, pa = 0.84), glycerophospholipid metabolism (pe = 0.016, pa = 0.84), ether lipid metabolism (pe = 0.024, pa = 0.84), and alpha-linolenic acid metabolism (pe = 0.026, pa = 0.84) pathways were the most significantly enriched (Appendix 6). In the analysis of all variants, the most significantly enriched pathways were the steroid hormone (pe = 0.007, pa = 0.44), proteasome (pe = 0.008, pa = 0.44), ether lipid metabolism (pe = 0.008, pa = 0.44), and linoleic metabolism (pe = 0.012, pa = 0.44) pathways. None of these pathways were significantly enriched in any of the three subsets of data after correction for multiple tests; however, some genes overlapped between these pathways (Appendix 6), including genes in the phospholipase A2 and cytochrome P450 families.
We hypothesized that phenotypically extreme individuals carry rare risk variants of large effect or rare variants that when clustered together in genes or pathways account for a large proportion of the variance in AMD disease risk. Generally, although sequencing studies of extreme phenotypes with small to moderate sample sizes produce high false negative rates as there is power to detect rare variation but not to detect association , these studies are a valuable discovery tool. In this data set of 75 cases and controls with extreme liability scores, we identified numerous novel variants that had not previously been reported in four large SNV databases. One of these novel variants, in the neurofilament gene NEFH on chromosome 22, occurred in two cases and was predicted to be nonsynonymous and damaging. A known variant in NEFH (rs59371099) was also one of the most strongly associated damaging exonic SNVs, occurring in nine cases but in only two controls (Table 3). Neurofilament genes maintain neuronal caliber, have been implicated in neurodegenerative diseases such as amyotrophic lateral sclerosis (ALS), and act as a biomarker for neuronal damage . The expression of NEFH is also highly enriched in retinal ganglion  and Müller cells . Therefore, variants in NEFH could be relevant to AMD. Other notable novel variants included a novel intronic variant found in four cases but not in the controls that is upstream of MMP16 (OMIM 602262), an endopeptidase in the same gene family as MMP9 (OMIM 120361), which was previously linked to AMD and is associated with the breakdown of the extracellular matrix . We also identified three novel nonsynonymous damaging exonic variants in known AMD risk genes (Appendix 5): one variant in complement factor I (CFI, OMIM 217030) that was found in one case, one variant in ATP-binding cassette transporter (ABCA1, OMIM 600046), and one in acyl-CoA dehydrogenase (ACAD10, OMIM 611181) that were each found in one control. The impairment of CFI is associated with AMD risk via inflammation through the protein’s role in regulating the complement pathways [34,64], while ABCA1 is associated with removal of cellular lipids and ACAD10 with fatty-acid oxidation.
Although we did not identify a single variant of large enough effect to attain genome-wide significance outside the known risk loci, we identified many variants with substantial differences in frequency between cases and controls and strong additive effects on affection status after controlling for the risk factors age and sex. Most of these variants occurred in regions not previously associated with AMD and may highlight regions that have risk or protective effects that could be prioritized for follow-up.
To assess whether any of the most strongly associated variants were associated with AMD case or control status in a larger sample, we looked up the top variants in the study (Table 2, Table 3, and Appendix 4) in the IAMDGC’s exome chip results from >30,000 advanced AMD cases and controls . One of our top variants reached genome-wide significance in the exome chip data set: rs440160 in TNXB (p = 4.28 × 10−29; Table 3), a member of the tenascin extracellular matrix glycoprotein family that has antiadhesive effects. TNXB also contributed to two pathways that were significantly enriched : the extracellular matrix and focal adhesion pathways. An association between variants in TNXB and AMD was reported previously, where the association remained statistically significant after conditioning on the adjacent C2/CFB locus . However, strong linkage disequilibrium in the C2/CFB/SKIV2L region has been reported [66,67], and rs440160 was in strong linkage disequilibrium with the most significant variant in this region in our dataset (rs635348, p=0.001; R2=0.9, D′=1). The role of C2/CFB, activators of an alternative complement pathway in AMD, has been extensively reviewed due to their role in the complement system , and SKIV2L has been hypothesized to be involved in AMD through RPE autophagy [66,67], which has been linked to AMD via the formation of drusen and the presence of autophagic markers in drusen .
Only one other of the our top variants was statistically significant at a p value of less than 0.05 in the exome chip data set: rs10951942, a nonsynonymous damaging exonic variant in chromosome 7 open reading frame 57 (C7orf57), a gene that has not previously been associated with AMD. However, some of the our top variants were not available in the exome chip analysis, including several rare damaging variants that we identified within known AMD risk loci from Fritsche et al. (2013)  or Fritsche et al. (2016)  (Appendix 5), further validating the use of sequencing technology in the search for novel variants of AMD.
We also identified several variants with strong additive effects that could play a role in regulating AMD risk in phenotypically extreme individuals who do not have a high genetic risk burden at common risk loci or in controls who have a high risk score. The most statistically significant variant in the data set was the intronic variant rs4849050 in FBLN7, an extracellular matrix cell adhesion molecule that is associated with elastic tissue, which was found more frequently in the controls (16 controls versus two cases). The minor allele (C) of this variant also tended toward a protective effect in the exome chip data set (p = 0.13). The extracellular matrix pathways were also significantly enriched , and other genes in the fibulin family (FBLN3, OMIM 601548; FBLN5, OMIM 604580; and FBLN6, OMIM 608548) have been associated with AMD [70-72] and similar retinal diseases  in other studies. Furthermore, FBLN7 is expressed in the retina and recently was found to negatively regulate inflammation, inhibiting angiogenesis . As blood vessel growth is a hallmark of CNV, FBLN7 could play a role in preventing AMD and thus is a candidate key target for future work to determine the role of this gene family in AMD and other ocular disorders.
Additional single variants of potential interest included three variants near PDXK, a gene that aims the conversion of vitamin B6 to pyridoxal-5-phosphate for metabolism, that were found more frequently in our controls. Vitamin B6 intake (together with vitamin B12 and folic acid) has been linked to reduced AMD risk ; genetic variants in PDXK that alter vitamin B6 metabolism could therefore also potentially alter AMD risk. A damaging exonic variant in COL23A1, a member of the transmembrane collagen family, also had a strong additive effect on case and control status (p = 0.002; β = 3.7) and may be of interest because two collagen pathways and variants in two collagen genes (COL4A3, OMIM 120070 and COL8A1, OMIM 120251) were significantly associated with advanced AMD in the exome chip data set . Collagen is integral to membranes in the eye, such as Bruch’s membrane, in which degeneration or modification (for example, the deposition of drusen) is known to lead to macular disease. Finally, rs113889337 in PLEKHB1, a variant found more frequently in the extreme cases (Appendix 4), may also be relevant to AMD because this gene codes for an integral membrane protein highly expressed in photoreceptor cells and is a candidate gene for retinal degeneration .
In addition to targeting single variants of large effect, we assessed the aggregate effect of rare variants clustered within genes or pathways. Although no gene or pathway was significantly enriched after correction for multiple tests, two potential candidate genes were significant at p<1×10−4 before correction for multiple tests: ASRGL1 and BSDC1. A mutation in ASRGL, an enzyme that hydrolyzes L-asparagine and isoaspartyl peptidase, was recently shown to be involved in photoreceptor loss and retinal abnormality in zebrafish , while BSCDI is a region not previously associated with AMD disease risk. Broader analysis of these genes is certainly warranted in larger data sets and in AMD-relevant tissues.
The most enriched pathways in the analyses included several lipid synthesis and metabolism pathways, including the ether lipid, glycerophospholipid, alpha-linolenic acid, unsaturated fatty acid, and sphingolipid pathways. The complex interplay between lipid regulation and eye health is known (e.g., reviewed in ). Dyslipidemia is an important risk factor for eye diseases, and systemic cardiovascular disease is associated with vascular alterations in the retina. Systemic conditions resulting from cardiovascular risk factors impact AMD (e.g., chronic hypoxia upregulates VEGF, the main target of wet AMD treatment strategies). Hypertension and other risk factors for arteriosclerosis are also related to AMD. Glycerophospholipids are important in cell membrane function, while sphingolipid metabolism is important in retinal function . Intake of omega-3 fatty acids has also been linked to reduced risk of AMD, while lipid and cholesterol levels have been linked to increased risk in some [80-83] but not all  studies. Lipid genes and pathways were also among those found to be statistically significantly associated with AMD by  and other studies [29,82] and may play a significant role in AMD risk and prevention . Our most significantly enriched pathways also overlapped for several genes, including the phospholipase A2 and cytochrome P450 gene families, both of which have been linked to macular degeneration [85,86].
The present study, although thoughtful, has limitations. This experiment was performed on peripheral blood-derived cells, not ocular tissue. Evaluating expression differences in ocular and other AMD-relevant tissues would be ideal, if available. Future studies would benefit from analyzing differences between tissues from individuals with genetic risk profiles similar to those described in this report. Additionally, we considered individuals with extreme phenotypes relative to their genetic risk score values and did not include additional non-genetic data in the risk score calculations. Further, age is a significant risk factor for AMD, and although we controlled for age as a main effect in all analyses, it is possible that the individuals selected as high-genetic-risk controls could develop AMD later in life. However, we took every precaution possible to ensure the use of AMD-free controls by including only individuals with no indicators of AMD, including drusen, or those with only a few small drusen, which is a hallmark of normal aging. Finally, variants considered to be novel were those not available in any public variation database; this does not necessarily indicate that they are truly novel but could reflect lack of coverage by prior genotyping and sequencing efforts in these regions.
By sequencing extreme affected and unaffected individuals whose phenotype was not well predicted by their genotype at known risk loci, we identified several novel variants, including one novel damaging nonsynonymous variant that occurred in two cases and three novel nonsynonymous damaging variants that occurred in known AMD risk loci. Numerous reported variants in regions not previously associated with AMD showed strong additive effects on case and control status, and several lipid pathways that may contain candidate genes were identified. Together, the results, although modest, highlight regions that may be important in AMD pathogenesis that could be prioritized for follow-up in single variant, gene–gene, and gene–environment interaction studies with larger samples and in functional studies evaluating expression in AMD-relevant tissues and how they might alter AMD status.
Appendix 1. Details of the 19 risk score variants.
Appendix 2 Table of novel variants.
Appendix 3. Quantile-quantile plots of p-values.
Appendix 4. Most strongly associated variants by beta coefficient.
Appendix 5. Rare or novel damaging variants in AMD loci.
Appendix 6. Pathway analyses results.
We thank all the participants of this study. We thank Drs. Michael Hauser and Eric Postel for their assistance ascertaining participants. JAF has been a consultant for Allergan and Dorc, and received financial support from Alcon and Genentech. This work was presented at ARVO (2015) and ASHG (2014 and 2015) annual meetings. Grants: R01EY012118 (to MAP-V, WKS, JLH, AA), NIH Center Core Grant P30EY014801 (University of Miami) and by an unrestricted grant from Research to Prevent Blindness, New York, NY (University of Miami). RJS was supported by NEI T32 training grant (5T32EY023194). JNCB was supported by a PhRMA Informatics Fellowship..