Molecular Vision: Association of primary open-angle glaucoma with mitochondrial variants and haplogroups common in African Americans

Molecular Vision 2016; 22:454-471 <http://www.molvis.org/molvis/v22/454>
Received 21 December 2015 | Accepted 13 May 2016 | Published 16 May 2016

Navigate by section:

Introduction
Methods
Results AND Discussion

Abstract

Purpose: To estimate the population frequencies of all common mitochondrial variants and ancestral haplogroups among 1,999 subjects recruited for the Primary Open-Angle African American Glaucoma Genetics (POAAGG) Study, including 1,217 primary open-angle glaucoma (POAG) cases and 782 controls, and to identify ancestral subpopulations and mitochondrial mutations as potential risk factors for POAG susceptibility.

Methods: Subject classification by characteristic glaucomatous optic nerve findings and corresponding visual field defects, as defined by enrolling glaucoma specialists, stereo disc photography, phlebotomy, extraction of total DNA from peripheral blood or saliva, DNA quantification and normalization, PCR amplification of whole mitochondrial genomes, Ion Torrent deep semiconductor DNA sequencing on DNA pools (“Pool-seq”), Sanger sequencing of 3,479 individual mitochondrial DNAs, and bioinformatic analysis.

Results: The distribution of common African haplogroups within the POAAGG study population was broadly similar to prior surveys of African Americans. However, the POAG case population was found to be enriched in L1c2 haplogroups, which are defined in part by missense mutations m.6150G>A (Val83Ile, odds ratio [OR] 1.8, p=0.01), m.6253C>T (Met117Thr, rs200165736, OR 1.6, p=0.04), and m.6480G>A (Val193Ile, rs199476128, OR 4.6, p=0.04) in the cytochrome c oxidase subunit 1 (MT-CO1) gene and by a variant, m.2220A>G (OR 2.0, p=0.01), in MT-RNR2, which encodes the mitochondrial ribosomal 16s RNA gene. L2 haplogroups were predicted to be overrepresented in the POAG case population by Pool-seq, and the difference was confirmed to be significant with Sanger sequencing, that targeted the L2-associated variants m.2416T>C (rs28358580, OR 1.2, p=0.02) and m.2332C>T (OR 1.2, p=.02) in MT-RNR2. Another variant within MT-RNR2, m.3010G>A (rs3928306), previously implicated in sensitivity to the optic neuropathy-associated antibiotic linezolid, and arising on D4 and J1 lineages, associated with Leber hereditary optic neuropathy (LHON) severity, was confirmed to be common (>5%) but was not significantly enriched in the POAG cases. Two variants linked to the composition of the gut microbiome, m.15784T>C (rs527236194, haplogroup L2a1) and m.16390G>A (rs41378955, L2 haplogroups), were also enriched in the case DNA pools.

Conclusions: These results implicate African mtDNA haplogroups L1c2, L1c2b, and L2 as risk factors for POAG. Approximately one in four African Americans have these mitochondrial ancestries, which may contribute to their elevated glaucoma risk. These haplogroups are defined in part by ancestral variants in the MT-RNR2 and/or MT-CO1 genes, several of which have prior disease associations, such as MT-CO1 missense variants that have been implicated in prostate cancer.

Introduction

Primary open-angle glaucoma (POAG) is a progressive degeneration of the optic nerve with characteristic clinical findings that correspond to patterns of visual field loss. POAG is the major cause of irreversible blindness worldwide. POAG is especially prevalent in African Americans, who are four times more likely than Caucasians to develop the disease and succumb at an earlier age [1].

Similar to other neurodegenerative illnesses, such as Alzheimer disease and Parkinson disease, glaucoma pathogenesis likely involves mitochondrial dysfunction [2]. It is possible that a subgroup of patients with glaucoma may demonstrate a mitochondrial optic neuropathy but one that has more complex genetics than the mitochondrial disease Leber hereditary optic neuropathy (LHON). LHON is caused directly by mutations in mitochondrial DNA (mtDNA), albeit with highly variable expression and penetrance that is modified by the mtDNA genetic background [3,4]. POAG has been associated with a large number of nuclear genes, many of which are involved in mitochondrial function [5]. POAG lymphoblasts have been shown to exhibit a complex I defect in the mitochondrial oxidative phosphorylation pathway, with decreased rates of respiration, which could confer an increased susceptibility to cell death on retinal ganglion cells [6]. Highly efficient systemic mitochondrial function was recently shown to be protective against glaucomatous optic neuropathy in those with elevated intraocular pressure (IOP) [7]. Although mutations in the 16,569 bp mtDNA are well established as the primary cause of LHON, the role of heritable mitochondrial mutations in POAG, if any, is unclear.

All living humans share a most recent common ancestor (MRCA), an African woman who is estimated to have lived approximately 194,000±33,000 years ago [8]. The mitochondrial genome is extremely informative regarding maternal ancestry because of the genome’s matrilineal inheritance and lack of genetic recombination. The human mitochondrial family tree has been characterized extensively and subdivided into thousands of branches (PhyloTree). The major ancestral groupings, mtDNA haplogroups, are defined by collections of variants scattered throughout the mitochondrial genome that are inherited together and have accumulated over evolutionary time, with the oldest mutations defining the major macrohaplogroups and the deepest roots of the tree. The human mtDNA phylogeny is divided into major haplogroups designated L0, L1, L2, L3, L4, L5, and L6, which are further divided into smaller subgroups. Contemporary non-African haplogroups are derived from L3, with major branches designated M and N, and associated with ancient diasporas from Africa that occurred approximately 60,000 to 70,000 years ago [8,9]. A prior survey of 44 subjects determined that L3, L2, L1, and L0 groups are all present in the POAAGG study population, as are non-African haplogroups [10]. A study of a Saudi Arabian population [11] found a positive association of POAG with African (L) haplogroups and evidence for a protective effect of the Eurasian haplogroup N, but studies of Arab [12], Ghanaian [13], and northern European [14] populations failed to find significant associations with mitochondrial haplogroups and POAG.

Next-generation sequencing methods now permit large regions of DNA to be investigated economically, but population-scale whole genome analysis is still prohibitively expensive, due to the complexity of next-generation sequencing library construction and the incremental costs of enriching samples for regions of interest. The strategy of sequencing pooled DNAs, “Pool-seq,” has emerged as a cost-effective alternative to sequencing individuals separately [15]. Population allele frequencies determined by Pool-seq have been shown to correlate well with actual allele frequencies [16], and a pooled sequencing approach, using pools of 20 individual DNAs, has been successfully implemented on the Ion Torrent [17] PGM non-optical semiconductor platform [18].

The goal of this study was to perform a comprehensive survey of mitochondrial variation in a population consisting of 1,999 subjects, including 1,217 African American POAG cases and 782 African American controls, recruited in Philadelphia, Pennsylvania, for the Primary Open-Angle Glaucoma Genetics (POAAGG) Study [19]. Using the Pool-seq approach, we sought to simultaneously query all 16,569 positions of mtDNA for the presence of known or potentially pathogenic point mutations, infer the relative frequencies of all common mitochondrial haplogroups, and estimate the potential for the association of haplogroups and each position on mtDNA with POAG. If subsets of the African American population can be identified who are at increased risk from mitochondrial genetic dysfunction and/or geographical ancestries that may be linked to mitochondrial sequence variation, this information might be used to help target screening efforts or to direct clinical interventions such as bioenergetic therapies that may slow vision loss in glaucoma by supporting mitochondrial function [2,20,21].

Methods

Subject recruitment, phenotyping, and specimen collection

A total of 3,479 subjects were included in this study. All were recruited from the Scheie Eye institute and satellite locations in Philadelphia, USA and self-identified as American American. 50% of this total group had POAG and 50% were classified as normal controls at the time of recruitment. The subjects' mean age was 66 years and 72% were female and 28% were male. Subjects were recruited for the institutional review board (IRB)-approved POAAGG Study [19] after informed consent was obtained. Consistent with ARVO's statement on human subjects, this study adhered to the tenets of the Declaration of Helsinki [22]. Following examination by a fellowship-trained glaucoma specialist, subjects were classified as glaucoma case, glaucoma suspect, or control [19]. Cases were defined by the demonstration of characteristic optic nerve defects as well as corresponding visual field loss. Subjects classified as glaucoma suspects were deliberately excluded from this study, as were those few who did not describe their ancestry as Black, African American, African, or mixed race including African ancestry. Approximately 30 ml of peripheral blood was obtained from each subject in EDTA tubes. Blood specimens were frozen at −20 °C before DNA extraction. Alternatively, 2 ml of saliva was collected.

DNA extraction quantification and normalization

DNA was extracted from whole blood using PureGene Gentra kits (Qiagen, Valencia, CA), and the optional RNase digestion step was included. DNA concentrations were measured using the fluorescence-based Quant iT dsDNA Broad-Range (BR) assay kit (#Q-33130, Life Technologies, Foster City, CA). Fluorescence was measured with a Tecan Infinite 200 Pro multimode microplate reader (Morrisville, NC). An aliquot of each sample was normalized to a concentration of approximately 1–2 ng/μl using SequalPrep Normalization plate kits (Cat # A10510–01, Life Technologies). The concentrations of the normalized DNAs were verified using Quant iT dsDNA High sensitivity (HS) assay kits (Cat # Q-33120, Life Technologies). Equal volumes of up to 48 DNAs, corresponding to either POAG cases or controls, were combined to create a total of 43 DNA pools. Most pools contained DNA representing 48 people, but a small number contained fewer, because the samples were withdrawn from the original 96-well stock DNA plates, for example, if a subject’s status had progressed from control to POAG suspect or case. Twenty-six of the pools were created from POAG case DNAs, with 1,217 cases represented, and the remaining 17 pools were created using DNAs from 782 control subjects. Unpooled genomic DNAs from two individuals, whose mitochondrial genomes had previously been sequenced and were known to correspond to the phylogenetically distant mitochondrial haplogroups (L1c and H3b), were used as controls for the Ion Torrent sequence analysis. The replication group for Sanger sequencing (n=1,389) included DNAs collected from saliva samples using OraGene-500 kits (DNA GenoTek, Ottawa, Canada, Cat # OGR-500) and extracted with prepIT reagents (#prepIT-L2P) according to the manufacturer’s instructions.

Enrichment for mitochondrial DNA and Ion Torrent sequencing library construction

Whole mitochondrial genomes were amplified as nine overlapping fragments with PCR, using DNA from each of the 43 pools or two individuals as templates, with a primer set designed to avoid coamplification of nuclear mitochondrial pseudogenes [23]. The two additional unpooled individual DNAs were amplified in parallel with the pooled DNAs, to estimate the background noise from sequencing each position on mtDNA in the absence of signals arising from population variation. All PCR reactions were performed in a total volume of 12 μl using the Platinum Taq DNA polymerase kit (Cat # 10,966–034, Thermo Fisher). To obtain high yields on all products, two different PCR protocols were used, one optimized for six of the nine fragments, and the second optimized for the remaining three fragments. For mtDNA fragments designated “1,2,5,7,8,9” [23], the PCR reactions contained 1.2 μl 10X reaction buffer, 0.24 μl of 10 mM dNTP mix, 0.48 μl of 50 mM MgCl₂, 2.4 μl of primer mix with each primer at 2 pmol/μl, and 3 μl of template DNA (from 1 to 10 ng), 0.1 μl of Platinum Taq, 3.6 μl of 5M betaine, and 1 μl of nuclease-free water. Cycling conditions were 95 °C for 5 min; 35 cycles (94 °C for 1 min, 64 °C for 40 s, 72 °C for 3 min) with 0.2 degrees of touchdown; 72 °C for 5 min; final hold 10 °C. For mtDNA fragments designated “3,4,6” [23], the PCR reactions contained 1.2 μl 10X reaction buffer, 0.24 μl of 10 mM dNTP mix, 0.96 μl of 50 mM MgCl₂, 2.4 μl of primer mix with each primer at 2 pmol/μl, and 3 μl of template DNA (from 1 to 10 ng), 0.1 μl of Platinum Taq, 3.6 μl of 5M betaine, and 0.5 μl of nuclease-free water. Cycling conditions for fragments 3,4,6 were 95 °C for 2 min; 35 cycles (94 °C for 1 min, 58 °C for 40 s, 72 °C for 3 min) without touchdown; 72 °C for 3 min; final hold at 10 °C. Thermal cycling was performed on an Applied Biosystems 9700 with dual 384-well block, with all case and control pools amplified from the same master mixes and in parallel within the same thermal cycling run. The presence of the expected bands and yields was confirmed with gel electrophoresis.

Unequal volumes of PCR products were pooled to help equalize sequencing depth on the nine fragments. Nine microliters each of the fragment 3,4,5,6 PCR products were combined with 6 μl each of fragments 1,2,7,8,9 to compensate for lower PCR yields in the former group. An aliquot of each nine-fragment PCR product pool was then used as the template to construct the Ion Torrent library. Forty-five barcoded sequencing libraries were constructed according to the manufacturer’s instructions using 35 μl DNA containing approximately 100 ng of each PCR product pool as inputs (Ion Xpress^TM Plus Fragment Library Kit Cat # 4,471,269, Ion Xpress Bar Code Adapters 1–16 kit, Cat # 4,471,250), the Ion Xpress^TM Barcode Adapters 17–32 kit (Cat # 4,474,009), and the Ion Xpress^TM Bar Code Adapters 81–96 kit (Cat # 4,474,521, Life Technologies).

Semiconductor DNA sequencing on the Ion Torrent PGM

Sequencing libraries were quantified with quantitative polymerase chain reaction (qPCR) with the Ion Library Quantitation Kit (Cat # 4,468,802, Life Technologies) and diluted to 100 pM. The diluted sequencing libraries (10 pM) were then amplified on Ion Sphere^TM Particles (ISPs) and enriched for template-positive ISPs using the Ion PGM Template OT2 200 Kit (Cat # 4,480,974), Ion One Touch 2 instrument, and Ion One Touch ES. The Ion Sphere Quality Control Kit (Cat # 4,468,656, Life Technologies) was used to determine the fraction of template-positive ISPs. Enriched ISPs were then sequenced using the Ion PGM Hi-Q sequencing Kit (Cat # A25592, Life Technologies) and the 318 Chip Kit v2 (Cat # 4,484,354, Life Technologies). Ion Torrent sequencing results were deposited in the NCBI Sequence Read Archive with study accession number SRP067259.

Validation of Pool-seq with Sanger sequencing of individual DNAs

Variance frequencies estimated with next-generation sequencing on DNA pools were evaluated with Sanger sequencing targeting a region of the N-terminal part of the MT-CO1 gene (Gene ID 4512, OMIM 516030), performed individually for the 1,999 subjects whose DNAs were pooled, as previously described [10]. The actual mean variance frequency at each mtDNA position in this interval for all DNA pools was determined with Sanger sequencing of individual DNAs, and these means were compared to the Pool-seq estimates. To determine the association of variants within the MT-CO1 region with POAG with Sanger sequencing, a total of 2,157 subjects, 1,308 cases and 849 controls, were analyzed; that is, an additional 158 subjects were included. Sanger sequencing targeting the MT-RNR2 gene, for 2,090 subjects (1,258 cases and 832 controls), was performed to ascertain whether the frequencies of several variants detected in this region differed significantly, as predicted by sequencing on DNA pools. Additional Sanger sequencing for a replication group consisting of 1,389 subjects (510 cases, 879 controls) was performed to confirm the enrichment of L2- and L1c2b-associated variants, using POAG cases that were not included in the pooled sequencing group.

DNA sequence analysis, estimation of population frequencies, data filtering, and variant annotation

Sanger sequencing electropherograms were scored and converted to variance tables with Sequencher version 5.1 software (Softgenetics, State College, PA). Torrent Suite (software) version 4.2 was used for alignment of Ion Torrent PGM reads to the hg19 human genome reference sequence. The BAM files were then downloaded and reviewed using the Integrated Genomics Viewer (IGV) [24].

Counts of the four nucleotides at each position of the alignments were obtained using the count utility of the IGV Tools package from the command line. Output files were compiled in an Excel spreadsheet, and the nucleotide counts were converted to percentages to normalize for read depth. For each position on mtDNA, we computed the frequency of adenine (A), cytosine (C), guanine (G), and thymine (T) and the total frequency of all minor alleles. That is, if the rCRS reference sequence base was T, and the A, C, G, and T counts were 1%, 4%, 0%, and 95%, respectively, we would infer a 1% variant frequency for T>A, a 4% variant frequency for T>C, and a total raw “variance frequency” of 5%. To correct for spurious background noise, we calculated the mean variance frequency observed on the two unpooled samples for each position on mtDNA. This background frequency was then subtracted from the mean variance frequencies that were observed in the case and control DNA pools.

To be considered a “high confidence” call of a common variant from pooled sequencing, the following three conditions had to be met:

1) The position on mtDNA was not classified as “noisy”; noisy was defined as having a mean background variance frequency of >1% (on libraries corresponding to the two unpooled individuals’ samples).

2) The observed mean variance frequency on the pooled samples was significantly higher than on six replicates on unpooled samples. The filter was a t test, two-tailed, unequal variance, and p<0.05.

3) The residual estimated variance frequency was >1%, after subtraction of background noise at that position.

For comparing sequencing results from pooled sequencing to those from Sanger sequencing of individual samples, we defined “sequencing accuracy” as the fraction of concordant calls (agreements/total), where “agreement” means the same genotype was called by both methods above a given threshold, and total refers to the total number of base pairs that were sequenced. We defined a false negative variant as one for which Sanger sequencing detected a variant, but Ion Torrent sequencing did not (“disagreement”). The false negative rate on variants was calculated as (disagreements) / (total variants called by Sanger) above a given variance threshold. The false positive rate on variants was defined as (disagreements) / (total variants called by Ion Torrent pooled sequencing).

The mitochondrial genome was annotated for disease association using data from the MITOMAP compendium [25] and annotated for haplogroup associations using Build 16 of PhyloTree [26]. Haplogroup population frequencies were estimated using the following positions on mtDNA: L0 (1048, 3516, 5442, 9042, 9347, 10,589, 12,007, 12,720), L0a (5231, 11,176, 14,308), L1 (7055, 7389, 13,789, 14,178, 14,560), L1b (185, 710, 1438, 1738, 2768, 3308, 3693, 6548, 6827, 6989, 7867, 8248, 12,519, 14,769, 15,115, 16,126, 16,264, 16,270, 16,293), L1c (151, 186, 5951, 6071, 10,586, 12,810), L1c1 (3796, 3843), L1c2 (6150, 6253, 7076, 8784, 8877, 10,792, 10,793, 11,654, 16,286, 16,527), L1c3 (195, 6221, 6917, 11,302), L2 (2416, 8206, 9221, 10,115, 13,590, 16,390), L2a1234 (2789, 7274, 7771, 13,803, 14,566), L2b (1706, 2358, 4158, 4370, 4767, 5027, 5331, 5814, 6713, 8080, 8387, 12,948, 14,059, 16,114, 16,213), L2c (93, 325, 680, 3200, 13,928, 15,849), L3 (769, 1018), L3b (3450, 5773, 9449, 10,086, 13,914, 15,311, 15,824), L3d (7424, 8618, 13,886, 14,284), L3e (14,212), L3f (3396, 4218, 15,514, 16,209), M (489, 10,400, 14,783, 15,043), N (10,398), R (12,705), R0 (73), and U (11,467, 12,308, 12,372). Variance frequencies were averaged when more than one position was associated with a particular haplogroup. Variant positions harboring back mutations per PhyloTree, those common to more than one of any of these haplogroups, and those that did not correspond to high-confidence variable positions in the Pool-seq data were excluded from the frequency estimations and estimates of case:control odds ratios (ORs). We did not attempt to estimate the frequency of the common L2a haplogroup due to a lack of variable positions that met these criteria.

Results AND Discussion

Overview of the study population

The baseline demographics of the POAAGG cohort, including phenotyping methods, and criteria for classification as POAG case or control, have been described previously [10,19]. A total of 1,999 African American patients with POAAGG were included in the mitochondrial Pool-seq analysis based on the availability of genomic DNA extracted from whole blood and status as glaucoma case or control (glaucoma suspects were excluded). The majority of cases and controls were female, with a larger proportion of men (44.5%) in the case group than in the control group (33.2%). The control group was significantly younger (61±12 years) than the case group (71±11 years). Elevated IOP is a major risk factor for POAG, and as expected, the maximum recorded IOP was significantly higher for the POAG cases (mean 25 mm Hg in cases versus 16 mm in controls, p<0.0001), as was neurodegeneration, measured by the cup:disc ratio of optic nerves (mean 0.7 in cases versus 0.3 in controls, p<0.0001).

PCR amplification and deep sequencing of mitochondrial genomes

The 43 DNA pools were constructed with DNAs from 1,999 subjects, with 1,217 POAG cases represented in 26 pools, and 782 controls represented in 17 pools. The Ion Torrent sequencing coverage depth was variable across the mitochondrial genome, with the highest coverage depth within the MT-CYB gene and regions of lower coverage in the regions of the MT-ND1, MT-ATP6, and MT-ND6 genes. A representative coverage map from sequencing on one of the pools is shown in Figure 1. The 43 pools were sequenced to a mean depth of 1,931x, with coverage depth on individual libraries ranging from 718x to 3,189x. Assuming that just one individual in a pool of 48 has a homoplasmic mutation at a particular position on mtDNA and proportionate representation of individuals in that sequencing library, the expected variance frequency is 1/48 (2.1%), which would generate approximately 40 mutant base calls at the mean coverage depth. In other words, there was sufficient depth to detect the presence of single rare variants within each pool.

Two individual subjects’ (unpooled) DNA samples, known to have distant mitochondrial ancestry and non-overlapping sets of variants from prior sequencing, were sequenced in parallel with the pooled samples. This was done to estimate the background noise and the spectrum of spurious base calls at every position on mtDNA in the absence of low-level signals from true population variation. Neglecting sites with true variants, the mean variance frequency on these unpooled samples was 0.4±0.2%; however, this distribution was asymmetric with a long tail, with about 2% of sites having >1% variance frequency. We classified these sites as “noisy” and excluded them from subsequent analysis (Appendix 1). These signals could conceivably represent low-level heteroplasmy in the two unpooled subjects’ DNA, but typically, these calls were observed in both subjects and are far more likely to represent spurious basecalls and potential false positives. We estimate that the analytical sensitivity of this sequencing protocol, defined as the fraction of mutant alleles that can be detected on a wild-type background, was generally less than 1% for >98% of the positions on mtDNA.

Several hundred variants were inferred from Pool-seq, most corresponding to one or more common haplogroups. The population-scale whole genome analysis for all 16,569 positions on mtDNA is summarized in Figure 2A, with results for each position in Appendix 1. As expected, no population variance above the sequencing background noise was observed at most positions; human mtDNA has been diverging for only about 200,000 years, albeit at a rapid rate relative to nuclear genes. There is a conspicuous absence of variance with minor allele frequency above 10% in the region corresponding approximately to positions 4000 to 7000, spanning the MT-ND2 gene and parts of the MT-ND1 and MT-CO1 genes, and another such region near position 2000 at the junction of the 12s and 16s RNA genes, MT-RNR1 and MT-RNR2. The relative scarcity of highly polymorphic sites in these regions suggests they may be selectively constrained at the nucleotide level. Consistent with Figure 1 and expectations, the hypervariable regions, designated HVR1 and HVR2, contain the highest concentration of variants, many of which are highly polymorphic.

Mitochondrial haplogroups inferred from Pool-seq

A recent independent multicenter study of mitochondrial haplogroup lineages in African Americans identified 13 major macrohaplotypes as common, with population frequencies >1% [27]. Several of these groups overlap; that is, the L1 haplogroup is composed of subclades L1b and L1c. The mtDNA positions associated with these particular haplogroups are highlighted with colored markers in Figure 2A. As expected, all 13 haplogroups all appear to be represented in the POAAGG cohort, with one or more of the associated variants detected. The population variance frequency observed at these positions ought to be proportional to the expected population frequency of the associated haplogroup. For example, L3 was reported by Johnson et al. to be the most common haplogroup in African Americans [27], and the two mtDNA positions with the highest estimated frequencies, 34% and 35%, among this subset of haplogroup-associated variants correspond to haplogroup L3 (green circles, upper left corner, Figure 2A).

Validation of Ion Torrent Pool-seq with CE Sanger sequencing on individual samples

A 605 bp region of mtDNA was sequenced for the 1,999 subjects individually, using the Sanger method. This region spans approximately 3.7% of the mitochondrial genome and corresponds to an N-terminal part of the MT-CO1 gene. The Sanger sequencing was analyzed for each pool, to evaluate the accuracy of the Ion Torrent pooled sequencing. Genotyping efficiency >98% was achieved on the 605 bp interval, and the Sanger sequencing identified 11 positions on mtDNA as variable and with variance frequencies above 2%. These 11 positions corresponded to the following known common mitochondrial variants (with the associated L-haplogroups per PhyloTree in parentheses): m.6071T>C (L1c), m.6150G>A (L1c2), m.6185T>C (L0), m.6221T>C (L3b, L3e1) and m.6221T>A (L1c3), m.6253T>C (L1c2), m.6260G>A (L1c3a and L4b2), m.6413T>C (L3e2a1b), m.6524T>C (L3e3), m.6548C>T (L1b), m.6587C>T (L3e1), and m.6663A>G (L2a1c). The variance frequencies determined from Sanger and estimated from Pool-seq are compared in Figure 2B. Sanger sequencing also confirmed the presence of two minor alleles at m.6221: the more common C>T transition that has arisen twice independently on different L3 lineages and the less common C>A transversion, associated L1c3 (Appendix 1). False positives in the Ion Torrent Pool-seq data are also perceivable in Figure 2B. A region of elevated background noise can be seen near rCRS position 6400; positions 6420 and 6442 are labeled “FP” and had predicted variance frequencies >2% but were not validated with Sanger sequencing. These are examples of positions that were characterized as “noisy” and filtered from subsequent analysis. A total of 381 variable positions were identified on mtDNA that survived filtering and were considered high-confidence variable positions (Appendix 1).

These results confirmed the reliability of the Ion Torrent Pool-seq method for the detection of common mitochondrial variants. However, the estimated minor allele frequencies were clearly subject to experimental error, likely arising from imperfect normalization, variation in the ratio of nuclear to mtDNA, and/or variable PCR amplification of the individual mitochondrial genomes. The variance frequencies estimated from Pool-seq in some cases underestimated and in other cases overestimated true frequencies (Figure 2B). Sanger sequencing confirmed the absence of extremely polymorphic variants, those that have minor allele frequencies >15%, within the 605 bp interval (Figure 2A), and confirmed that none of the variants that had been designated “high confidence” within this interval was a false positive. For high confidence variants, sequencing accuracy by Ion Torrent was 100%; the false negative rate for variants was 0%, and the false positive rate for variants was 0%.

Association of mitochondrial haplogroups with POAG in African Americans

We compared the prevalence of the most common haplogroups in the study’s controls relative to reported frequencies in African Americans, as well as the haplogroup distribution in POAG cases versus controls. Using the 381 high confidence variable positions, we computed the mean variance frequencies for sets of haplogroup-associated variants and compared these frequencies to estimates from an independent multicenter study [27]. As depicted in Figure 3, the frequencies inferred from the POAG controls mostly accorded well with those reported in the 2015 study by Johnson et al. [27], but L2, L2b, and non-African haplogroups (M, N, R, R0, and U) frequencies were lower for POAAGG subjects, and the proportion of L3e was higher. The POAG case pools appeared to be substantially enriched in L1c and L2 haplogroups relative to the controls.

Another objective of our study was to visualize the potential variation in POAG risk across the population in the context of their ancestral mitochondrial relationships and relative divergence times. Figure 4 summarizes the phylogenetic relationships among haplogroups that were inferred based on defining sets of variants, with coloring to indicate which groups were enriched in the POAG case or control pools, with potential high-risk groups (OR 1.4–3.3) highlighted in red. The pooled sequencing results suggested that L0(a), L1b, L1c(1,3), and L3(b,e) and the non-African branches M, N, and R were associated with protection or lack of risk (estimated OR 0.7–1.0). The L1c2, L2, L2a1’2’3′4, L2a1c, L2b, and L2c lineages were implicated for potentially elevated POAG risk (OR >1.4). The L1c2 lineage is defined in part by a pair of missense mutations in the MT-CO1 gene, and the pooled sequencing predicted that the subgroups L1c2b1a’b, defined by m.2220A>G in MT-RNR2, and L1c2b1b, defined by a third missense in MT-CO1, m.6480G>A, were also present (Appendix 1). None of these missense variants are present in the L1c1 lineage, which did not emerge as a potential risk factor (Figure 4). The ten other variants that define L1c2 are either synonymous or within non-coding regions. With the exception of some ancient single nucleotide polymorphisms (SNPs) in the hypervariable regions, common to nearly all branches represented by PhyloTree, there is no overlap between the sets of variants that define these L1c2 and L2 sublineages; that is, there was no evidence of glaucoma risk arising from a variant common to both lineages.

Validation of POAG association with haplogroups with Sanger sequencing

The 605 bp N-terminal region of the MT-CO1 gene that was Sanger sequenced encompassed several common variants associated with one or more common haplogroups, thus having the potential to validate the haplogroup associations suggested by the pooled sequencing results in Figure 4 and Figure 5. Sanger sequencing of 1,308 cases and 849 controls showed significant overrepresentation of three MT-CO1 missense variants in the POAG case group: m.6150G>A (OR 1.8, p=0.01), m.6253T>C (OR 1.6, p=0.04), and m.6480G>A (OR 4.6, p=0.04). Pool-seq correctly predicted the m.6150G>A missense variant as the “top hit” for POAG association in this region, although Pool-seq overestimated the odds ratio (Appendix 1). The L1c2 lineage is defined in part by the pair of missense variants, m.6150G>A (Val83Ile) and m.6253C>T (Met117Thr), in the MT-CO1 gene. A third less common missense variant in MT-CO1, m.6480G>A (Val192Ile), has arisen on the L1c2b1b subgroup of L1c2. Because this variant was (correctly) classified as rare by Pool-seq, with <1% minor allele frequency, m.6480G>A was excluded from the association analysis that was limited to common variants (Figure 5). Sanger sequencing also confirmed the predicted lack of positive association of POAG with several other African haplogroups, consistent with Figure 4: L1b (m.6548C>T, OR 1.0, p=0.82), L3b+L3e1 (m.6221T>C, OR 0.9, p=0.82), L3e2a1b (m.6413T>C, OR 0.7, p=0.37), L1c3 (m.6221T>A, OR 0.7, p=0.26), L3e3 (m.6524T>C, OR=0.9, p=0.66), and L0 (m.6185T>C, OR 1.1, p=0.81).

Additional Sanger sequencing targeting the MT-RNR2 region for 2,090 subjects was performed to confirm the predicted overrepresentation of the L1c2, L2, and L2a1c haplogroups in cases (Figures 3 and 4) and the top association hits within this region (Figure 5). Two variants, associated with L2 haplogroups, were confirmed to be significantly more prevalent in cases, and the expected lack of association of POAG with L1b and other haplogroups was also confirmed by sequencing the MT-RNR2 region (Table 1). When combined with Sanger data from a replication group consisting of an additional 1,389 subjects, not included in the Pool-seq group, association with POAG was significant for two L2-associated variants and one L1c2b-associated variant.

Potential association of 381 variable positions on mtDNA with POAG

We also analyzed each of the 381 positions with “high confidence” variance individually for association with POAG. Many of these have arisen on multiple lineages and are associated with more than one haplogroup. For each position, the observed mean variance frequency in the POAG case and control pools was multiplied by the total number of individuals whose DNA was present in the pools, to estimate the number of individuals in the case and control pools that have sequence variation at that position (Appendix 1). These numbers were then compared to ask whether the estimated numbers of cases and controls differed significantly for each variable position, given the known population sample sizes, and assuming the Pool-seq population frequency estimates were correct. The results are summarized in Figure 5, with the result for each position plotted as −10log (imputed p value). The top seven associations have imputed p values of less than 1×10⁻⁶, and they would be highly significant if the population frequencies inferred from the pooled sequencing are correct. The following seven variable positions are primarily associated with L2 haplogroups, with the exception of m.2220: m.8206G>A (L2 + (non-L)), m.11944T>C (L2a'b'c'd + (non-L)), m.2416T>C (L2, L3d3a1a, + (non-L)), m.13590G>A (L2, L0k + (non-L)), m.7274C>T (L2a1'2'3′4), m.2220A>T,G (A>T: L4b2b1 + (non-L); A>G: L1c2b1a'b), and m.16390G>A (L2, L0a2b + (non-L)).

The highest odds ratio among the top hits was estimated for position m.2220 within MT-RNR2, which was predicted to harbor two variants: an L1c2b1a'b-associated A>G transition and an L4b2b1-associated A>T transversion. The variance frequency at m.2220 estimated by Pool-seq was 5.2% in the POAG pools versus 0.3% in the control pools. Sanger sequencing on 2,090 individual unpooled DNAs confirmed the existence of both variants but determined that m.2220A>G was about fourfold more common than m.2220A>T in both cases and controls; that is, the predicted association of m.2220 with POAG stemmed primarily from L1c2b1 haplogroups, not L4b2b1. The association of the less common variant, m.2220A>T, with POAG in the Pool-seq group (OR 1.6) was positive but not significant, whereas the association of the m.2220A>G variant was significant (p=0.023) in the replication group of 1,389 additional samples and in the combined data set (p=0.013, Table 1). The association of the presence of either variant, G or T, at m.2220 was also significant (p=0.003) in the combined data set (n=3,479).

Positions that have been annotated as disease-associated by the MITOMAP resource [26] and/or with variants classified as “pathogenic” or “likely pathogenic” by the NCBI Variation Viewer are highlighted with red markers in Figure 5, and disease-associated variants that have estimated POAAGG population frequencies >3% are listed in Appendix 2. The status of most of these variants is “reported” and not “confirmed,” with highly variable evidence for association with disease; accordingly the variants in Appendix 2 may be potentially deleterious (or protective) but not necessarily pathogenic. The most common disease-associated variant was m.195T>C (rs2857291), located in the non-coding D-Loop and associated with bipolar disease in MITOMAP and with numerous African haplogroups, but there was no evidence for POAG association; the minor allele was observed at 30% frequency in the case and control pools. The missense mutation m.10086A>G, associated with the L3b haplogroup, has been linked to hypertension-associated end stage renal disease in African Americans [28]; however, the risk allele was estimated here to be more common in controls (7.9%) than in POAG cases (4.5%), consistent with L3b ancestry not conferring elevated POAG risk among African Americans or being mildly protective (Figure 4).

LHON variants

LHON is characterized by the selective degeneration of the retinal ganglion cell layer and the optic nerve that develops in young adults, and approximately 90% of LHON is attributed to three mitochondrial mutations (m.3460G>A, m.11778G>A, and m.14484T>C) in the MT-ND4, MT-ND6, and MT-ND1 genes, respectively [29]. However, approximately 10% of LHON cases do not harbor one of these mutations, and this disease has variable penetrance and expressivity. The primary variants that cause LHON have not been linked to POAG, and these three mutations were not detected in either the case or control pools (Appendix 1). Other mutations have been described as causing LHON, but their status is less certain, lacking evidence of clear segregation with cases. “Secondary mutations,” common polymorphisms identified as more common in LHON cases, e.g., m.4216T>C, m.13708G>A, and m.15257G>A, have also been reported [29], and multiple studies have found that mitochondrial haplogroup background affects the clinical presentation of LHON [3,4,30]. The missense variant m.15812G>A was detected in the POAAGG population and has been proposed as an LHON helper variant (Appendix 2), but the estimated frequencies in the cases and controls did not differ substantially. The m.10398A>G missense variant has also been proposed as an LHON secondary mutation that may act synergistically to increase the penetrance of LHON when coupled with one of the four primary mutations [31,32]. The m.10398A>G polymorphism is associated with African haplogroups L1c1a1, L0d1b1, and L3e1a3. However, we observed the minor allele more frequently in the POAG control pools (9.1%) relative to the case pools (6.1%); thus, it does not appear to be a potential risk factor for African American POAG. This variant has been associated with Fuchs’ endothelial corneal dystrophy in a study of Europeans, with the minor allele (G) associated with decreased risk [33]. The minor allele has also been linked to decreased risk of Parkinson disease, albeit with weak and conflicting evidence [34]. A study of Chinese patients with Leigh syndrome (LS) associated the G allele with increased risk for LS [35]. LS is a mitochondrial illness with symptoms that may include optic neuropathy.

rRNA genes (MT-RNR1, MT-RNR2, and humanin)

The MT-RNR2 gene, which encodes the 16S subunit of the mitochondrial ribosome, harbored seven variants that have associations with POAG that were potentially significant (Figure 5). The m.2220 position in MT-RNR2 was a top hit, and Sanger sequencing confirmed that variance at m.2220 was significantly more common in POAG cases (OR=2.0, p=0.003), implicating L1c2b1a'b haplogroups as risk factors for POAG. The top association in MT-RNR2 predicted by Pool-seq was for a much more common variant, m.2416T>C (rs28358580), which is associated with L2 and L3d3a1a haplogroups, and had an estimated variance frequency of 31% in the case pools versus 18% in the control pools. Sanger sequencing of 2,090 individuals confirmed that m.2416T>C was significantly more common in cases than controls (OR 1.2, p=0.008) in the Pool-seq group; however, the difference was less significant (p=0.02) when data from the replication group of 1,389 samples was added (Table 1). The apparent absence of m.15208A>G (Appendix 1), associated with haplogroup L3d3a1, suggests that any association of m.2416T>C with POAG is due to the L2 haplogroups. We are not aware of the prior association of m.2416T>C with disease, but two other variants in MT-RNR2, m.2352T>C (rs28358579) and m.3010G>A (rs3928306), have been associated with pathologies and were predicted by Pool-seq to have potential association with POAG. In the case of m.2352T>C, the minor allele (C) appears to be protective for POAG; accordingly, the majority of the study subjects possess the potential risk allele. Sanger sequencing confirmed that the C allele was more common in the controls, and the difference was significant in the Pool-seq group of 2,090 samples (OR=0.8, p=0.017), but the difference was not statistically significant in the replication group (Table 1). The m.3200T>A variant was confirmed to be elevated in cases (OR 1.4), consistent with the prediction that haplogroup L2c confers risk (Figure 4), but this difference was not statistically significant (p=0.07, Table 1). Another 16s RNA variant, m.3010G>A, which had a prior disease association, was estimated by Pool-seq as 7% in the POAG pools versus 4% in the control pools. Sanger sequencing confirmed that m.3010G>A was common (5.2% in cases), but this proportion was nearly identical in the controls (5.1%; Table 1). A minor allele, m.3010A, is associated with cyclic vomiting syndrome with migraine (MITOMAP). Interestingly, the m.3010G>A variant has also been implicated in sensitivity to linezolid, an antibiotic that can cause optic neuropathy.

The m.2706 and m.3010 positions were of particular interest because they are reportedly close to the ribosomal peptidyl transfer center, which binds antibiotics such as chloramphenicol and linezolid, and these positions define some mtDNA haplogroups implicated in LHON penetrance. On account of their bacterial endosymbiotic origin, mitochondrial ribosomes may be unintended targets of antibiotics that disrupt bacterial protein synthesis, e.g., linezolid. Pacheu-Grau et al. [36] reported that cybrids harboring the m.3010A allele had significantly lower amounts of mitochondrial translation products, lower ratios of p.MT-CO1/succinate dehydrogenase subunit A, and lower ratios of complex IV quantity to citrate synthase activity after treatment with linezolid. The m.3010G>A variant has arisen repeatedly; it is present in some western European populations and is associated with multiple phylogenetically distant African haplogroups (L0a4, L2a1c, and L4b1a). The D4 haplogroup, found in Asian and Native American populations, is defined by m.3010G>A and two other variants and has been associated with the clinical expression of LHON in Chinese patients [30]. In Europeans, m.3010G>A is one of two variants that define J1 haplogroups, which have been solidly established as increasing the penetrance of the primary LHON mutations [3,4]. In our study population, this variant appears to stem from African haplogroup L2a1c and European haplogroup H65a. However, our results do not suggest that m.3010G>A confers elevated susceptibility to POAG.

In the case of m.2706, the m.2706A allele is one of two variants that define the European haplogroup H, reported to be an LHON resistance factor [37]. Most African Americans (97.5% of cases versus 96.6% of controls) were found to possess the m.2706G allele, which appears to be ancestral, but the association with POAG was insignificant. Mitochondrial J cybrids, associated with LHON severity, reportedly have significantly decreased mitochondrial protein synthesis, and it was proposed that this might be explained by m.2706G [3]. If variation at this position influences mitochondrial protein production, on account of proximity to the peptidyl transfer center, it is tempting to speculate that m.2706G may contribute to African Americans’ elevated susceptibility to POAG relative to populations who have European ancestry, in addition to the LHON risk of some J European haplogroups. However, our results do not support this proposal.

The MT-RNR2 gene also encodes humanin, a peptide that has been shown to have protective effects in age-related diseases, particularly Alzheimer [38]. Pool-seq did not detect any common variants within the humanin-encoding region, but bidirectional Sanger sequencing on individual DNAs detected an unusual rare mutation, m.2639C>A, in a single glaucoma patient; therefore, it is not impossible that variation in this neuroprotective peptide might be relevant to glaucoma pathogenesis in isolated cases. The patient’s mutation is predicted to cause the third amino acid in the humanin peptide, proline (“P3”), to be replaced by threonine. P3 is known to be an essential amino acid for humanin function, and replacement of P3 by alanine abrogates function [39].This particular patient was enrolled in the POAAGG study at age 89 with bilateral POAG, which was severe stage in the right eye and moderate stage in the left eye. Before cataract surgery, the patient had high myopia (−7.50 sph right eye, −8.00 + 0.75×055 left eye). The axial length of her right eye was 26.68 mm and that of her left eye was 26.72 mm. She was noted to have myopic appearing nerves at age 49, slight pallor of the right optic nerve at age 56, and bilateral pallor with optic disc cupping in the right eye at age 61, at which time IOP-lowering therapy was started. Disc photos taken after glaucoma diagnosis show peripapillary atrophy in both eyes with optic nerve cupping in the right eye greater than in the left. The patient’s maximum intraocular pressure was 23 mm Hg in both eyes, and her pachymetry was normal (545 μm in the right eye and 550 μm in the left eye). The patient’s past medical and surgical history was significant for arthritis, spinal disc surgery at age 49, and vascular surgery of the right leg at age 81.

Three common disease-associated variants (m.750A>G, m.921T>C, m.1438G>A) were detected in MT-RNR1, encoding the 12S RNA gene of the mitochondrial ribosome, but their estimated prevalence differed little between the POAG case and control pools.

Sequence variation in the MT-CO1 gene

Heritable and somatic mutations in the MT-CO1 gene are found at disproportionate rates in patients with prostate cancer [40]. For prostate cancer, as for POAG, African American ancestry, advanced age, and positive family history are recognized as important risk factors. Two variants that were found to be significantly associated with prostate cancer in African American men [41] are the synonymous m.6221T>C polymorphism, associated with haplogroup L1c3, and m.7389T>C (rs9783095, associated with haplogroups L1, L0d1a1a, L3d3b B4f) that causes a Tyr496His missense change. Sanger sequencing showed that m.6221T>C was not elevated in POAG cases. The m.7389T>C MT-CO1 missense variant was estimated by Pool-seq to be slightly more common in the cases (21%) than in the controls (17%), but this variant was outside the regions that were also Sanger sequenced; therefore, this result could not be confirmed. Three additional missense variants implicated in prostate cancer [42] were determined to be common in the POAAGG population with minor allele frequencies estimated as 2%–4%: m.6150G>A (Val83Ile, absent from dbSNP, haplogroups L1c2, HV1a1b), m.6253T>C (rs200165736, Met117Thr, H15, D5b1, M1a3b1, M13’46’61, A2am, L1c2), and m.6663A>G (rs200784106, Ile254Val, L2a1c). The existence of all four missense variants was confirmed with Sanger sequencing on individual DNAs (Figure 2B), and Sanger sequencing confirmed there was a significant enrichment of m.6150G>A (p=0.01), m.6253C>T (p=0.04), and m.6480G>A (p=0.04) variants among the POAG cases. These data suggest that MT-CO1 missense mutations might simultaneously elevate African Americans’ risk for prostate cancer and POAG.

Mitochondrial variants linked to the microbiome

In recent years, the human microbiome has been linked to several common diseases, including an ocular disease, uveitis, which is triggered by microbiota-dependent autoimmunity [43]. It has been proposed that manipulation of the microbiome could be beneficial for uveitis and for glaucoma, possibly by modulating brain-derived neurotropic factor to promote survival of retinal ganglion cells [44]. The oral microbiome has recently been implicated in glaucoma; patients with glaucoma were found to have significantly higher oral bacteria counts than controls [45]. Interestingly, two of the variants with the strongest potential for association with POAG from the present study, m.16390G>A (HVS1 hypervariable region), and m.15784T>C (MT-CYPB), were also top hits in an association study of mitochondrial variation with alterations in the gut microbiome [46]. The m.16390G>A variant was also found to be more common in type 2 diabetes cases than in controls in a Tunisian study population, but the nominally significant association (p=0.04) did not survive multivariate regression analysis [47].

Other studies of glaucoma that used next-generation sequencing of mtDNA

Most of the common variants described in this report were previously reported in a study of 22 African American POAG patients and 22 controls, based on whole mitochondrial genome next-generation sequencing of individual (unpooled) DNAs [10]. That study was intended to detect novel pathogenic variants, but the African American POAG cases were found to closely resemble the controls in the proportion and likely pathogenicity of novel variants, transversion rates, and mutational spectrum detectable in blood. Jeoung et al. sequenced mtDNA in a discovery cohort of 20 Korean patients with normal tension glaucoma (NTG), followed by Sanger sequencing on 196 patients with NTG and 202 controls [48]. Their association of m.4883C>T in the MT-ND2 gene with NTG survived correction for multiple testing, but this variant was not considered a high-confidence variant in the present study. Sundaresan et al. performed whole mitochondrial genome sequencing on the Ion Torrent platform for 32 POAG cases with Irish and Indian ancestry; the researchers identified 15 novel or known variants with pathogenic potential based on PolyPhen analysis [49]. Two of these variants, m.1892A>G and m.2755A>G, were located in the MT-RNR2 16s RNA gene, but they were not detected in the African American POAAGG cohort.

A mitochondrial etiology for POAG?

Recent human cybrid studies have found mitochondrial haplogroup background to be associated with multiple functional effects. As an example, a comparison of H (European) versus L (African) haplogroup cybrids found differential expression of respiratory genes, differences in mtDNA copy number, ATP turnover rates, reactive oxygen species production, and differences in nuclear gene expression, including inflammation-related signaling genes and the canonical complement system [50]. A review by Coskun et al. [51] outlined a mechanism for Alzheimer and Parkinson disease progression in which common mitochondrial variations may influence individuals’ predisposition to neurodegeneration. The researchers proposed that some ancient functional mtDNA variants may now be maladaptive, causing mild defects in mitochondrial function. Although these heritable variants are insufficient in themselves to reduce the threshold required for normal neurologic function, they might accelerate the rate of age-related accumulation of somatic mtDNA mutations, permitting a threshold to be crossed, beyond which sufficient mitochondrial energy production cannot be sustained. Such defects would manifest primarily in tissues with extremely high energy demands, such as retinal ganglion cells. This mechanism is consistent with the associations found between some common mitochondrial variants and POAG in our study, but functional studies are needed. Subjects harboring these variants may be more sensitive to elevated intraocular pressure or environmental insults, such as exposure to linezolid or other mitotoxic chemicals. Interventional studies, involving protective antioxidants and other approaches, have demonstrated that such therapies may promote neuroprotection by counteracting oxidative stress and mitochondrial dysfunction [52]. Accordingly, mitochondrial sequence analysis has the potential to personalize such treatments.

Limitations of this study

The Pool-seq methodology enabled the rapid and economic integration of whole mitochondrial genome data from a large study population but imposed several limitations. As shown in Figure 2B, the minor allele frequencies inferred by Pool-seq were subject to experimental error. This is likely to result from variable normalization of mitochondrial DNA within each pool, resulting in disproportionate representation of individuals. The concentration of DNA eluted from the normalization plates was somewhat variable, and total DNA concentration was used as a proxy for mitochondrial DNA concentration, whereas the ratio of mitochondrial to nuclear DNA may have differed from sample to sample, and at least one study has reported that haplogroup background affects mtDNA content [50]. The latter errors might be minimized by quantifying mitochondrial DNA directly with a method such as qPCR, instead of using total DNA concentration as a proxy for mtDNA. Another possible source of error was unequal PCR amplification, caused by variation in initial template DNA quality, or during the downstream steps during sequencing library construction. As a result of these issues, it is possible that representation of some individuals was lost before or during library construction, and particular individuals are necessarily under- or overrepresented within each pool. Accordingly, the effective population size of the case and control pools is likely to be smaller than assumed in the calculations that were used to rank potential variant associations (Figure 5), and the frequencies of variants and their ancestral haplogroups may be over- or underestimated.

Another limitation stems from sequencing pools of enzymatically sheared DNAs in conjunction with a short read (about 200–300 bp) technology. The analysis used reduction of the read alignments to simple nucleotide counts to infer population frequencies. This means that potential information about the phase of nearby variants within individual reads was not considered; instead, the presence of individual haplogroups was inferred by the presence of a particular set of defining variants, not the complete haplotypes that might have been obtained from complete mtDNA genome sequencing on individuals. Because many polymorphisms have arisen independently on multiple lineages, using the frequencies of individual variants as a proxy for haplogroup frequency necessarily yields approximations that are most likely to overestimate, but as shown in Figure 3, the estimates accorded reasonably well with those from an independent study of haplogroup prevalence in other populations of African Americans.

We did not attempt to analyze the pooled sequencing data for the presence of insertion or deletion (indel) variants or complex changes; therefore, the data interpretation was necessarily restricted to point mutations. It is also possible that some of the point mutations that were inferred are artifacts of adjacent indels, although most correspond to known common variants, which are predominately point mutations. Finally, a subset of positions on mtDNA that were characterized as “noisy,” corresponding to about 2% of the mitochondrial genome, were excluded from the association analysis. However, other ancestral variants from the same haplotype would have been included if they corresponded to high-confidence positions.

Conclusions

Pool-seq of 1,999 individuals demonstrated that the mitochondrial haplotype distribution of the POAAGG study population is broadly similar to that of other African American populations, with all major African macrohaplogroups represented and with L3, L2, L1, and L0 in decreasing order of frequency, consistent with a preliminary survey of this population [10]. Significant differences were found between haplogroup prevalence in the POAG case and control subgroups. Specifically, the case group was confirmed to be enriched in L1c2 lineages, which harbor up to three disease-associated missense variants in the MT-CO1 gene and an MT-RNR2 variant. L2 haplogroups were also implicated as risk factors for POAG; these lineages contain a common variant in MT-RNR2 and ones recently linked to the microbiome, whereas the L1b haplogroup was confirmed not to be a risk factor for POAG among our subjects. Approximately one in four African Americans possesses mitochondria that have L1c2 or L2 ancestries. If they are confirmed as POAG risk factors, those subpopulations might be prioritized for screening efforts and for inclusion in clinical trials designed to test therapies or dietary interventions intended to preserve or enhance mitochondrial function.

Statistics

The TTEST function (two-tailed, unequal variance) in Microsoft Excel was used to compare mean variance frequencies in the pooled versus unpooled samples and used as a filter to differentiate low-frequency population sequence variation from background sequencing noise. The CHITEST function in Excel was used to compare the estimated numbers of cases and controls that have variants at the 381 positions that were selected for the association study, to evaluate the relative likelihood that each difference in the estimated number of cases and controls that have a particular genotype was explainable by chance. The confidence intervals on proportions, for reported and estimated frequencies of African American haplogroups, were determined using the modified Wald method and an online calculator (GraphPad). For confirmatory Sanger sequencing, the number of cases and controls who harbor each variant was analyzed as 2×2 contingency tables using a two-tailed Fisher’s exact test or a chi-square test with Yates correction (if n>2000) with version 6 of GraphPad Prism (GraphPad Software, La Jolla, CA).

This Article

Google Scholar

Association of primary open-angle glaucoma with mitochondrial variants and haplogroups common in African Americans