Molecular Vision 2013; 19:2173-2186
Received 05 March 2013 | Accepted 30 October 2013 | Published 02 November 2013
1Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD; 2Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD; 3Department of Ophthalmology, University of Pennsylvania, Philadelphia, PA
Correspondence to: Joan E. Bailey-Wilson, Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD; Phone: (443) 740-2921, FAX: (443) 740-2165; email: email@example.com
Purpose: Refractive error is a complex trait with multiple genetic and environmental risk factors, and is the most common cause of preventable blindness worldwide. The common nature of the trait suggests the presence of many genetic factors that individually may have modest effects. To achieve an adequate sample size to detect these common variants, large, international collaborations have formed. These consortia typically use meta-analysis to combine multiple studies from many different populations. This approach is robust to differences between populations; however, it does not compensate for the different haplotypes in each genetic background evidenced by different alleles in linkage disequilibrium with the causative variant. We used the Age-Related Eye Disease Study (AREDS) cohort to replicate published significant associations at two loci on chromosome 15 from two genome-wide association studies (GWASs). The single nucleotide polymorphisms (SNPs) that exhibited association on chromosome 15 in the original studies did not show evidence of association with refractive error in the AREDS cohort. This paper seeks to determine whether the non-replication in this AREDS sample may be due to the limited number of SNPs chosen for replication.
Methods: We selected all SNPs genotyped on the Illumina Omni2.5v1_B array or custom TaqMan assays or imputed from the GWAS data, in the region surrounding the SNPs from the Consortium for Refractive Error and Myopia study. We analyzed the SNPs for association with refractive error using standard regression methods in PLINK. The effective number of tests was calculated using the Genetic Type I Error Calculator.
Results: Although use of the same SNPs used in the Consortium for Refractive Error and Myopia study did not show any evidence of association with refractive error in this AREDS sample, other SNPs within the candidate regions demonstrated an association with refractive error. Significant evidence of association was found using the hyperopia categorical trait, with the most significant SNPs rs1357179 on 15q14 (p=1.69×10−3) and rs7164400 on 15q25 (p=8.39×10−4), which passed the replication thresholds.
Conclusions: This study adds to the growing body of evidence that attempting to replicate the most significant SNPs found in one population may not be significant in another population due to differences in the linkage disequilibrium structure and/or allele frequency. This suggests that replication studies should include less significant SNPs in an associated region rather than only a few selected SNPs chosen by a significance threshold.
Refractive error (RE) is the leading cause of preventable blindness, with large societal, economic, and public health implications. Around 25% of U.S. adults are myopic [1,2], and in some parts of Southeast Asia, the prevalence is now in excess of 70% among teens [3,4] and young adults . In addition to the personal impact of the costs of eyeglasses, contact lenses, or refractive surgery, high-grade myopia increases the risk of other ocular problems such as retinal degeneration, cataracts, glaucoma, and choroidal neovascularization .
As part of an international effort to characterize the risk factors responsible for refractive errors and the recent increase in prevalence observed in many countries and populations, environmental risk factors are receiving needed attention in addition to genetic influences. Twin studies and family aggregation studies estimate the heritability of refractive errors to be on the order of 50%–90% [7-10]. Two recent genome-wide association studies (GWASs) identified strong association with refractive error in two locations on chromosome 15. Solouki et al.  reported an association on 15q14 that was subsequently replicated in several other populations [12,13]. Hysi et al.  published a second locus on 15q25 at the same time. The Consortium for Refractive Error and Myopia (CREAM) recently performed a large meta-analysis of both loci in 31 population cohorts  and replicated the 15q14 locus only. However, the single nucleotide polymorphisms (SNPs) in this region did not replicate robustly for each cohort. The Age-Related Eye Disease Study (AREDS) did not significantly contribute to the association signal on both chromosome 15 loci. We hypothesized that the choice of replicating 14 SNPs on 15q14 and five SNPs on 15q25 was too narrow (henceforth referred to as “CREAM replication SNPs”). The approach of narrowly selecting SNPs for replicating association signals assumes that all populations with a true signal in the region have the same SNPs associated with the trait. Given the heterogeneous nature of refractive error and the different patterns of linkage disequilibrium across populations, this method may not reflect the association strength in each population. Although Verhoeven and colleagues  mentioned that the tested SNPs had similar allele frequencies across the populations in the study, the chosen SNPs may not adequately capture the underlying haplotypes in every population.
Various authors have suggested that regional replication of GWAS signals is more appropriate than simply selecting the most significant SNP from each region. Recently, in a replication study of fasting plasma glucose in African Americans, Ramos et al.  showed that local replication of a candidate region (performed by querying a 500 kb window centered on all 29 SNPs that were associated with the trait) resulted in detection of new significantly associated SNPs. This result confirmed Yang et al.’s finding  that variation due to a specific locus will be underestimated if only the most significant SNP in the region is selected. Asimit et al.  reported that the current choices for SNP replication detect only a small proportion of causal variants and are insufficiently powered. In our study, we show that in the AREDS data additional SNPs located nearby have a stronger association than the originally chosen CREAM replication SNPs [16,17,19-21].
The AREDS cohort was initially designed as a long-term, multicenter, prospective study to assess the clinical course of age-related macular degeneration (AMD) and age-related cataract . In addition to collecting natural history data, the AREDS included a randomized clinical trial of high-dose vitamin and mineral supplements for AMD and a clinical trial of high-dose vitamin supplements for cataract [22-24]. Before the study was initiated, the protocol was approved by an independent data and safety monitoring committee and by the institutional review board for each clinical center. Written informed consent was obtained from all participants in accordance with the Declaration of Helsinki. AREDS participants were 55 to 80 years of age at enrollment and had to be free of any illness or condition that would make long-term follow-up or compliance with study medications unlikely or difficult. Visual acuity measurement of all participants was performed with the electronic visual acuity tester (EVA) using the electronic early treatment diabetic retinopathy study (E-ETDRS) visual acuity testing protocol. A refraction measurement was performed for participants with visual acuity of fewer than 74 letters in each eye at the initial visit and all participants at the randomization visit. For the current analysis, a subset of the control group from the original AREDS cohort was included: 2,000 Caucasian participants aged 60 and older who did not have AMD and were further screened to exclude individuals with cataracts, retinitis pigmentosa, color blindness, other congenital eye problems, LASIK, artificial lenses, and other eye surgery. Refractive error at baseline enrollment in the AREDS [22-25] was analyzed, taking the mean spherical equivalent (MSE) across both eyes (or spherical equivalent in a single eye when both eyes were not measured) as the trait of interest. Age, gender, and the first three principal components (to adjust for significant population stratification) were included as covariates. Appendix 1 shows the clinical characteristics and the number of cases with high, moderate, and mild myopia and the number of cases with high or moderate hyperopia. Since the number of individuals in these subcategories was small, separate analysis of each subcategory was not performed due to inadequate power.
DNA was obtained from the Coriell Institute (Camden, NJ) and was genotyped using a genome-wide Illumina SNP array (2.5 million SNPs) and custom SNPs.
All participants were genotyped at the Center for Inherited Disease Research (CIDR) using the Illumina HumanOmni2.5–4v1_B chip array (San Diego, CA). Genotype and phenotype data from the AREDS cohort are publicly available through the Genotype and Phenotype (dbGaP) database under the name of either the “Michigan, Mayo, AREDS, Pennsylvania (MMAP) study” or the AREDS. Initially, all SNPs in a window of 100 kb on either side of the discovery SNP were chosen for analysis. In the 15q25 region, this window was extended to 350 kb to provide better coverage in this region.
Tagging SNPs from the 15q14 and 15q25 regions were selected in HapMap release 28 PhaseII+III on Human genome NCBI B36 assembly, dbSNP b126. The HapMap genotype data were imported into Haploview 4.2 to obtain the r-square (r2) linkage disequilibrium (LD) plot. Selection of tagging SNPs was limited to those with a minor allele frequency of 5% in the CEU HapMap sample. The tagging SNPs on 15q14 and 15q25 were centered on the golgin A8 family, member B (GOLGA8B), gap junction protein, delta 2 (GJD2), actin, alpha, cardiac muscle 1 (ACTC1), and Ras protein-specific guanine nucleotide-releasing factor 1 (RASGRF1) genes and were designed to provide better coverage of these regions than was available from the SNPs on the Illumina HumanOmni2.5–4v1_B chip array alone.
All 23 SNPs were genotyped using a customized TaqMan SNP genotyping assay (Applied Biosystems [ABI], Foster City, CA). All PCR amplifications were performed with the following thermal cycling conditions: 95 °C for 10 min followed by 40 cycles of 95 °C for 15 s and 60 °C for 1 min. PCR reactions were performed with TaqMan Genotyping Master Mix (ABI) in either a GeneAmp PCR System 9700 (ABI) or a 7900HT Fast Real-Time PCR System (ABI). All pre- and post-PCR plate readings were performed on a 7900HT Fast Real-Time PCR System (ABI), and the allele types were confirmed by the system’s software (7900HT Fast Real-Time PCR System SDS software version 2.3; ABI).
All participants were genotyped at the Center for Inherited Disease Research (CIDR) using the Illumina HumanOmni2.5–4v1_B chip array. Genotype and phenotype data from the AREDS are publicly available through the Genotype and Phenotype (dbGaP) database under the name of either the MMAP study or the AREDS.
Illumina Omni2.5 array-- All genotypes were imported into PLINK , and individuals were removed if they had a low genotyping rate (≥1%), known cryptic relatedness, chromosomal abnormalities (described elsewhere ), or non-Caucasian ancestry according to principal components analysis . The final genotyping rate in the remaining 1,877 individuals was 0.999.
All genotypes were imported into PLINK, and individuals were removed for a low genotyping rate (≤20%) and the same filtering criteria described for the Illumina SNP panel. The final genotyping rate in the remaining 1,879 individuals was 0.997.
First, RE was analyzed as the MSE of both eyes using linear regression in PLINK. Age, gender, and education, plus three principal components, were included as covariates in the analysis. Myopia and hyperopia were analyzed as a categorical trait using logistic regression in PLINK along with the covariates age, gender, education, and the three principal components. For myopia, affected individuals were defined as an MSE of −1D or worse and controls as an MSE of 0D or greater. Individuals with an MSE of between −1D and 0D were coded as unknown. Hyperopia was defined as an MSE of +1D or greater, controls were defined as an MSE of 0D or less, and individuals between 0 and +1 were coded as unknown.
For the 15q14 and 15q25 regions, we tested the reported discovery SNPs, the replication SNPs from the Verhoeven et al. replication
study, our custom SNPs, and all additional SNPs available from our Illumina chip that were within 100 kb from the original
discovery SNP for 15q14 and 350 kb for 15q25. We used Ramos et al.’s  method to calculate the number of effective tests (
For each locus, we had good power for detecting association for the quantitative trait expressed as the MSE with a variant in this sample at the p=0.002 level (which corrects for the number of independent loci as detailed in the Methods and trait results) across a range of minor allele frequencies. We had power between 60% and 78% for allele frequencies between 0.3 and 0.5 on 15q14 and above 80% for allele frequencies of 0.2 to 0.5 for 15q25. For the qualitative trait myopia (≤ −1D), the power was above 80% for allele frequencies of 0.05 to 0.5 for 15q14 and between 60% and 77% for allele frequencies between 0.25 and 0.5 for 15q25. For hyperopia (≥ +1D), the power was above 70% for allele frequencies above 0.1 for 15q14, and the power for 15q25 was below 30% for allele frequencies above 0.1. Power calculations assumed 1) an additive quantitative trait locus (QTL) effect size (β) and discrete trait odds ratio (OR) based on the reported values from each paper: β=–0.27 and OR=1.41  for the 15q14 locus and β=–0.35 and OR=1.16  for the 15q25 locus, 2) minor allele frequencies across a range 0.05–0.5, and 3) complete LD between the marker SNP and the causal variant (D’=1). Full details of the power calculations are available in Appendix 1.
A total of 1,337 and 1,224 SNPs were tested in the 15q14 and 15q25 regions, respectively (the most significant discovery SNP, the other SNPs used for replication in the CREAM replication study, our custom SNPs, and the additional SNPs in each region that were available from our Illumina 2.5M chip and were within 100 kb (15q14) or 350 kb (15q25) of the most significant discovery SNP in the region). The significance threshold calculated using Ramos et al.’s method was
for the 15q14 region and was
for the 15q25 region.
There were 1,877 individuals available for analysis, with an MSE of +0.56D (standard deviation=2.15). For myopia, 346 individuals met the criteria for cases, and 1,333 were controls. For hyperopia, a total of 858 individuals met the criteria for inclusion as cases, and 602 as controls.
None of the SNPs from the CREAM replication study of the 15q14 region were even nominally significantly associated with mean MSE in our data set (Table 1). The most significant discovery SNP, rs634990 (p=0.20), was not replicated (Table 1). Of the additional SNPs we genotyped, none of these achieved nominal significance in the 15q14 region (Table 2). When all SNPs available from the Illumina array that were within 100 kb of the discovery SNP rs634990 are included, the most significant SNP was 7.38×10−3, which is 7.26 kb away from the closest originally associated SNP picked for the replication study (Figure 1 and Appendix 1).
None of the chosen replication SNPs from the CREAM replication study or our custom genotyped SNPs for the 15q25 region were even nominally significantly associated with mean SEM in our data set (Table 1). The most significant of these SNPs was rs7183668 (p=0.03; Table 2). Including all SNPs available from the Illumina array within 350 kb of rs8027411, the most strongly associated SNP was rs2002832 (p=3.49×10−3), which is 17.7kb from the closest originally associated SNP picked for the replication study (Figure 2 and Appendix 1).
As seen with MSE, none of the CREAM replication SNPs or the custom TaqMan SNPs were even nominally significant. The most significant of these SNPs was rs533021 (p=0.07; Table 2). Inclusion of all SNPs from the Illumina array that were within 100 kb of rs634990 showed several SNPs with some evidence of association (Appendix 1). The strongest signal was from rs893132 (p=2.8×10−3), which is close to but not below our replication threshold of 0.002, but several SNPs had similar values (Figure 3 and Appendix 1).
No SNPs were even nominally significantly associated with myopia in the CREAM replication SNPs. In the custom SNP genotyping, rs7183668 was only slightly more significant for myopia than the MSE (myopia p=0.09 versus MSE p=0.1). The remaining SNPs from the array showed some evidence of association, with several SNPs having p values of the order of 10−3 although none were below 0.002 (Figure 4 and Appendix 1).
A total of 858 individuals met the criteria for inclusion as hyperopes, and 602 were controls.
Ten of the 14 CREAM replication SNPs were nominally significant in this analysis, with the strongest signal coming from rs7176510 (p=0.01; Table 1). The additional genotyped SNPs were not significant (Table 2). A full analysis of all SNPs in the region from the Illumina array within 100 kb of rs634990 showed a more consistent signal, with a cluster of SNPs close to the CREAM replication SNPs with much more significant p values (Figure 5). The most significant SNP was rs1357179 (p=1.69×10−3), which passed the significance threshold of 0.002 for replication even in this small sample (Appendix 1).
All the CREAM replication SNPs were not even nominally significant. Of the additional custom genotyped SNPs, one SNP (rs7183668, p=0.03) achieved nominal significance at p=0.03 (Table 2), but this did not pass the replication significance threshold of 0.0019. Analysis of the full set of available genotypes in the region (within 350 kb of rs8027411) identified a cluster of SNPs within 0.3 Mb of the CREAM replication study SNPs that were more significant (Figure 6). The most significant SNP was rs7164400 (p=8.39×10−4), which passed the replication threshold (Appendix 1).
Early GWAS designs were frequently underpowered and did not adequately control for type I error. The need to address these issues has led to large consortia being formed so that the studies would have sufficient sample sizes for discovery and replication of results. As these consortia have increased in size, the diversity of the populations added has also increased. Controlling for population stratification within populations and using meta-analysis to deal with between-population differences in sample size and different tagging marker allele frequency allow us to control for chance associations. However, the problems of allelic heterogeneity and different patterns of linkage disequilibrium and haplotypes remain. Our approach of querying a large number of SNPs that are within 100 kb (15q14) or 350 kb (15q25) of the most significant SNP from the discovery study helps to ameliorate the loss of power due to different LD patterns in the associated region across the populations.
Recent GWASs have identified several loci in European populations that confer susceptibility to RE, and efforts to replicate these loci have met with some success for the 15q14 locus. However, the AREDS cohort did not even nominally replicate the SNPs queried in the CREAM replication study except hyperopia at the 15q14 locus (p=0.03). Additional genotyping of a few additional tagging SNPs in the region yielded only nominally significant results. However, selecting a denser panel of SNPs from an expanded region around the candidates (all candidate region SNPs that were available on the Illumina HumanOmni 2.5 array) revealed the association of several SNPs not genotyped in the original Rotterdam study or included in the CREAM replication effort. These SNPs are actually close to the SNPs reported as significant in the CREAM replication study. Calculating the effective number of tests reveals that several SNPs approached or exceeded the significance thresholds for replication even in this small sample.
The AREDS cohort study is a United States–wide collection of Caucasian individuals who have a wide variety of European backgrounds and possibly low levels of admixture with African and indigenous populations (low enough to be undetectable by analyses for population stratification). Allele frequency clines exist in Europe, and it is possible to map quite accurately a European’s exact country of origin based on a small number of informative markers . Indeed, our principal components analyses of the entire set of GWAS markers showed significant evidence of a population substructure . These analyses  analyses of the AREDS data corrected for several principal components and clear outlier individuals were removed, but some subtle substructure may still exist. However, we suspect that the most likely reason for the different results from the AREDS data and some of the other data sets that strongly replicated the “CREAM replication SNPs” association in  is the difference in how the “CREAM replication SNPs” tag the other SNPs in the region that were not analyzed. Thus, the more mixed background of the AREDS cohort compared to the Rotterdam and TwinsUK studies may account for the inability of the AREDS sample to replicate the chosen SNPs at even the 0.05 significance threshold in the large CREAM meta-analysis of the 15q14 region . Our power calculations suggest that we had sufficient power to detect association (at p=0.002) for the CREAM replication SNPs. Indeed, when we queried a much denser panel of SNPs within the candidate region, we found stronger evidence of association with each trait and had significant evidence of replication for hyperopia. Of course, the moderate size of this study leaves open the possibility that association with other SNPs in this region was not detected due to lack of power. However, this study illustrates what multiple authors have pointed out: That is, that regional replication is a more powerful approach than merely picking a few SNPs to replicate, particularly when the small number of SNPs picked for replication may not tag the underlying haplotypes (and thus the ungenotyped true causal SNP) in the same manner across different data sets. This paradox suggests that testing a small number of significant SNPs for replication from an associated region could lead to non-replication in the replication cohort. Ioannidis et al.  pointed out that when only one or a few of the most significant SNPs from an associated region are included in the follow-up set, the selected SNP(s) are not necessarily more informative or closer to the causal variant. They suggest that using only one or a few SNPs for replication leads to less robust information for those regions and may result in failure of the replication. They propose that combining complete GWASs in a meta-analysis is a more fruitful approach than attempting to replicate only one or a few significant SNPs from a candidate region. Asimit et al.  also supported the idea of combining a large number of cohorts in a meta-analysis as a method for improving power. Several analysis approaches that incorporate multiple SNPs have been published [32-39], as well as approaches that incorporate linkage information  and pathway-based association approaches . Christoforou et al.  proposed using a LD-based binning strategy to interpret and compare multiple GWASs, an approach that may prove to be the most fruitful. Still, issues surround handling SNPs, which map to multiple and sometimes overlapping genes, and correlations between genes and derivative gene scores that need to be resolved. In the meantime, when attempting to replicate significant associations, studying a denser panel of SNPs from the associated region may be more powerful [16-18], and imputation to the HapMap and/or the 1,000 Genomes data can help provide information on genotypes at the same markers across studies even if they used different GWAS genotyping platforms.
Whole-exome and whole-genome sequencing studies will produce data on more variants than ever before, many of them individually quite rare. The temptation to study only the variants that have been genotyped in all the member cohorts of a consortium will be strong. However, in traits where alleles of modest effect are sought, this approach misses many variants of interest. Many techniques are being developed to combine data from multiple SNPs, from the various collapsing methods for rare variants to gene-based and pathway-based approaches. However, no robust method has yet emerged for combining these results across heterogeneous populations. Moving away from cohorts of unrelated individuals to family studies may help address some of these issues. Refocusing efforts on linkage, which is robust to allelic heterogeneity, could assist with detecting rare variants with large effects. For existing consortia, the challenge is to find a method that can adequately account for allelic and haplotypic heterogeneity, while still controlling type I error.
Appendix 1. Supplementary tables from S1-S8.
Appendix 2. Linkage disequilibrium structure in the 15q14 region in the AREDS cohort.
Appendix 3. Linkage disequilibrium structure in the 15q25 region in the AREDS cohort.
Appendix 4. Zoomed in section of linkage disequilibrium in the 15q14 region to show location of original CREAM replication SNPs (green bar).
Appendix 5. Zoomed in section of linkage disequilibrium in the 15q25 region to show location of original CREAM replication SNPs (green bar).
This work was supported in part by NEI