Molecular Vision 2015; 21:1235-1251
<http://www.molvis.org/molvis/v21/1235>
Received 07 July 2015 |
Accepted 22 October 2015 |
Published 26 October 2015
Rebecca King,1 Lu Lu,2 Robert W. Williams,2 Eldon E. Geisert1
1Department of Ophthalmology and Emory Eye Center, Emory University, Atlanta, GA; 2Department of Anatomy and Neurobiology and Center for Integrative and Translational Genomics, University of Tennessee Health Science Center, Memphis, TN
Correspondence to: Eldon E. Geisert, Department of Ophthalmology, Emory University, 1365B Clifton Road NE, Atlanta, GA, 30322; Phone: (404) 778-4239; FAX: (404) 778 4111; email: egeiser@emory.edu
Purpose: Differences in gene expression provide diverse retina phenotypes and may also contribute to susceptibility to injury and disease. The present study defines the transcriptome of the retina in the BXD RI strain set, using the Affymetrix Mouse Gene 2.0 ST array to investigate all exons of traditional protein coding genes, non-coding RNAs, and microRNAs. These data are presented in a highly interactive database on the GeneNetwork website.
Methods: In the Normal Retina Database, the mRNA levels of the transcriptome from retinas was quantified using the Affymetrix Mouse Gene 2.0 ST array. This database consists of data from male and female mice. The data set includes a total of 52 BXD RI strains, the parental strains (C57BL/6J and DBA/2J), and a reciprocal cross.
Results: In combination with GeneNetwork, the Department of Defense (DoD) Congressionally Directed Medical Research Programs (CDMRP) Normal Retina Database provides a large resource for mapping, graphing, analyzing, and testing complex genetic networks. Protein-coding and non-coding RNAs can be used to map quantitative trait loci (QTLs) that contribute to expression differences among the BXD strains and to establish links between classical ocular phenotypes associated with differences in the genomic sequence. Using this resource, we extracted transcriptome signatures for retinal cells and defined genetic networks associated with the maintenance of the normal retina. Furthermore, we examined differentially expressed exons within a single gene.
Conclusions: The high level of variation in mRNA levels found among the BXD RI strains makes it possible to identify expression networks that underline differences in retina structure and function. Ultimately, we will use this database to define changes that occur following blast injury to the retina.
Large-scale sequencing initiatives have led to a new era in understanding gene and genome functions [1-5]. There is now an acute need for powerful approaches that integrate and analyze massive proteomics/genomics data sets. In vision research, many single gene variants are known to cause vision loss, including retinitis pigmentosa [6-9], Usher syndrome [10,11], and some forms of glaucoma [12]. However, many ocular diseases have a complex genetic basis with multiple chromosomal loci contributing to differences in the susceptibility and severity of the disease. Two prominent examples are glaucoma [13-15] and age-related macular degeneration [16,17]. In addition, the response of the eye and the retina to trauma is driven by a host of different genes expressed in a large number of different cell types.
Until recently, it was extremely difficult to define the genetic and molecular basis of complex diseases or to adequately monitor the response of the eye and the retina to injury. We used a novel and powerful approach that relies on systems biology and a mouse genetic reference panel, the BXD family of recombinant inbred (RI) strains. This resource is particularly well suited to define complex genetic networks that are also active in human diseases. This approach allows us to not only identify specific gene variants involved in retinal disease and response to injury but also place corresponding molecular changes in a global context in the eye and the retina.
The initial efforts of our group explored the genetic diversity of the BXD family of strains to define the genetic networks active in the eye (see data sets and refs [18] and [19]). In this study, we created a new mouse retinal database that offers a more complete description of the mouse transcriptome. This resource uses the genetic covariance of expression across a panel of 52 BXD strains to identify cellular signatures and genetic networks within the mouse retina. The array we used provides expression profiling at the exon level for 26,191 well-established annotated transcripts, as well as 9,049 non-coding RNAs, including more than 600 microRNAs. Using the bioinformatics tools located on GeneNetwork, we examined the cellular signature of RPE cells. We also analyzed a genetic and molecular network involved in neuronal development and axon growth. In both examples, we highlight the specific benefits of the new database with a special emphasis on microRNAs, non-coding RNAs, and the exon level data available with the Affymetrix MouseGene 2.0 ST array.
All of the procedures used involving mice were approved by IACUC at the Emory University and adhered to the ARVO Statement for the Use of Animals in Research. The Department of Defense (DoD) Congressionally Directed Medical Research Programs (CDMRP) Normal Retina Database uses the Affymetrix MouseGene 2.0 ST Array (May 15, 2015). Robust multiarray average (RMA) analysis and scaling were conducted by Arthur Centeno. This data set consists of 52 BXD strains, C57BL/6J, DBA/2J, and an F1 cross between C57BL/6J and DBA/2J. A total of 55 strains were quantified. There is a total of 222 microarrays. All data from each microarray used in this data set is publicly available on GeneNetwork.
These are RMA expression data that have been normalized using what we call a 2z+8 scale, but without corrections for batch effects. The data for each strain were computed as the mean of four samples per strain. The expression values on the log2 scale ranged from 3.81 to 14.25 (10.26 units), a nominal range of approximately 1,000-fold. After taking the log2 of the original non-logged expression estimates, we converted the data within an array to a z-score. We then multiplied the z-score by 2. Finally, we added 8 units to ensure that no values were negative. The result was a scale with the mean expression of the probes on the array of 8 units and a standard deviation of 2 units. A twofold difference in expression is equivalent to roughly 1 unit on this scale. The lowest level of expression was 3.81 (Olfr1186) from the DoD CDMRP (the Normal Retina Database uses the Affymetrix MouseGene 2.0 ST Array, May 15, 2015). The highest level of expression was rhodopsin for 17462036 (Rho). The highest single value was 14.25.
Almost all animals were young adults between 60 and 100 days of age. We measured expression in conventional inbred strains, BXD recombinant inbred (RI) strains, and reciprocal F1s between C57BL/6J and DBA/2J.
The first 32 of the strains were from the Taylor series of BXD strains generated at the Jackson Laboratory (Bar Harbor, ME) by Benjamin A. Taylor. BXD1 through BXD32 were started in the late 1970s, whereas BXD33 through 42 were started in the 1990s. BXD43 and higher were bred by Lu Lu, Jeremy Peirce, Lee M. Silver, and Robert W. Williams starting in 1997 using B6D2 generation 10 advanced intercross progeny. This modified breeding protocol doubles the number of recombinations per BXD strain and improves the mapping resolution [20]. All of the Taylor series of BXD strains and many of the new BXD strains are available from The Jackson Laboratory. Several strains were specifically excluded from the data set. For BXD43 and higher, the DBA/2J parent carried the Tyrp1b mutation and the GpnmbR150X mutation; these two mutations produce pigment dispersion glaucoma. Mice that carried these two mutations were not included in the data set: BXD53, BXD55, BXD62, BXD66, BXD68, BXD74, BXD77, BXD81, BXD88, BXD89, BXD95, and BXD98. In addition, BXD24 was omitted, since it developed a spontaneous mutation, rd16 (Cep290) that resulted in retinal degeneration and was renamed BXD24b/TyJ [21]. Several additional strains were excluded due to abnormally high Gfap levels observed in the Full HEI Retina (April 2010) data set: BXD32, BXD49, BXD70, BXD83, and BXD89.
The mice were killed by rapid cervical dislocation. The retinas were removed immediately by placing the globe under pressure and cutting the cornea. The lens burst out of the opening in the cornea followed by the retina. In this process, no specific procedures were used to include or exclude the RPE. The retina was placed immediately in 1 ml of 160 U/ml RiboLock (Thermo Scientific Waltham, MA) for 1 min at room temperature. The retina was then transferred to Hank’s Balanced Salt solution with RiboLock in 50 µl RiboLock (Thermo Scientific, RiboLock RNase #EO0381 40 U/µl 2500U) and stored in −80 °C. The RNA was isolated using a QiaCube (Hilden, Germany) and the in-column DNase procedure. All RNA samples were checked for quality before the microarrays were run. The samples were analyzed using the Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA). The RNA integrity values ranged from 7.0 to 10. Our goal was to obtain data from independent biologic sample pools for both sexes of each BXD strain. The four batches of arrays included in this final data set collectively represented a reasonably well-balanced sample of males and females.
In the present study, we used the Affymetrix Mouse Gene 2.0 ST Array. The array was designed with a median of 22 unique probes per transcript. Each probe is 25 bases in length. The arrays provide comprehensive transcriptome coverage with more than 30,000 coding and non-coding transcripts. In addition, there is coverage for more than 600 microRNAs. For some arrays, the RNA was pooled from two retinas, and other arrays were run on a single retina. Dr. XiangDi Wang (University of Tennessee, Health Science Center) was involved in the retinal extractions and isolation of RNA. The Affymetrix arrays were run by two research core laboratories: the Molecular Resource Center at UTHSC (Dr. William Taylor, director) and the Integrated Genomics Core at Emory University by Robert B. Isett (Dr. Michael E. Zwick, director). In a separate set of experiments, we tested a set of arrays from C57BL/6J retinas run at each facility to determine if there were batch effects or other confounding differences in the results. We did not detect any significant difference in the arrays run at UTHSC or at Emory University. Thus, we included both sets of data in the analysis.
The DoD CDMRP Retina Database presents the retinal transcriptome profiles of 52 BXD RI strains on a highly interactive website, GeneNetwork. There are two separate presentations of the microarray data. The first is at the gene level (DoD CDMRP Retina Affymetrix Mouse Gene 2.0 ST (May 15, 2015) RMA Gene-Level Database), and the same data is presented at the exon level (DoD CDMRP Retina Affymetrix Mouse Gene 2.0 ST (May 15, 2015) RMA Exon-Level Database). For analyzing these data sets, a suite of bioinformatics tools is integrated in the GeneNetwork website. These tools identify genes that vary across the BXD RI strains, construct genetic networks that control the development of the mouse retina, and identify the genomic loci that underlie complex traits in the retina. In this paper, we present these two new data sets and illustrate their use with three examples. The first was to identify the genetic signatures of the RPE. The second identified genes that are differentially spliced between the C57BL/6J retina and the DBA/2J retina. In the third example, we looked at the genetic network associated with roundabout homolog 2 (Robo2) gene and the modulation of axonal growth.
The DoD CDMRP Retina Database has a unique signature for RPE cells. When looking at the expression of the RPE marker Rpe65, there was an almost biphasic distribution of expression (Figure 1). Many of the strains expressed low levels of Rpe65 (approximately 7 units on our scale) while other strains had high levels of expression ranging from two- to eightfold higher (8 to 11 units). We believe that this difference is due to the time of day that the retinas were removed from the eye. Many of the retinas were isolated at the University of Tennessee Health Science Center where isolation started at approximately 10:00 a.m. and lights on in the animal colony occurred at 6:00 a.m. These retinas were isolated approximately 4 h after the lights came on. At Emory University, the retinas were isolated starting at 9:00 a.m., and lights on occurred at 7:00 a.m. These later sets of retinas were removed at approximately 2 h after the lights came on in the animal colony. The mean expression level of Rpe65 in the samples isolated at the University of Tennessee was 8.3, and the mean for the samples isolated at Emory University was 10.2, roughly a fourfold difference in expression. This difference in expression between the samples isolated at the different locations was significant using a Student t test (p<0.01). If we examine the correlation between these differences in Rpe65 expression and the genes’ associated circadian rhythm, several genes correlate with the expression of Rpe65 across the data set, including cytochrome 2 (Cry2, r = –0.75) and period homolog 2 (Per2, r = –0.65).
When this data set is used, it is important to remember the difference in Rpe65 expression due to the time of day the retinas were isolated. However, if one is interested in defining a molecular signature of RPE cells and if the level of Rpe65 is directly associated with the number of cells that adhere to the retina, then we can use the level of Rpe65 in each strain to define a set of genes that covary with Rpe65 across the BXD strain set. When we examined the data set for genes with a similar expression pattern across the BXD strains, a list of genes uniquely expressed in RPE was observed (Table 1). This cellular signature represents genes that are expressed within the RPE, including retinal G protein coupled receptor (Rgr), lecithin-retinol acyltransferase (Lrat), retinol dehydrogenase 5 (Rdh5), transferrin (Trf), and retinal pigment epithelium derived rhodopsin homolog (Rrh). This signature can also be thought of as the result of genetic networks that drive gene expression within a given cell type.
With the MouseGene 2.0 ST Affymetrix chip, we found not only protein-coding genes that correlate with Rpe65 but also microRNAs and non-coding RNAs. When we examined the top 500 correlates of Rpe65 (all of which have a correlation higher than 0.8 with Rpe65), five microRNAs were present: Mir98, Mir666, Mir449a, Mir301b, and Mir28b (Table 2). Using the bioinformatics tools on TargetScan (Targetscan.org) [22-24], we predicted targets for each microRNA from the top 500 correlates of Rpe65. One microRNA, Mir666, did not appear on the TargetScan website. The remaining four microRNAs appeared on the website. When the microRNAs were scanned for targets, Mir98 had 29 targets in the RPE signature, Mir449a had 14 targets, Mir301b had 13 targets, and Mir28b had one target. This type of analysis may be one approach to constructing and understanding microarray networks within a specific cell type such as the mouse RPE.
For many Affymetrix probes, there is minimal annotation. For example, within the top 100 correlates of Rpe65, 17 Affymetrix probes are present that have minimal annotation furnished by Affymetrix. For two of these probes, Affy_17203447 and Affy_17204181, there is no annotation from Affymetrix, including the sequence of the probe itself. Thus, for these two probes, no further analysis can be conducted until Affymetrix furnishes sequence data. For the remaining 15 Affymetrix probes, the expression within the retinal transcriptome is low, ranging from 4.6 to 8.8. The sequences of these probes can be related to the mouse genome using the Verify Tool on the probe’s Trait Data and Analysis page on GeneNetwork. The three probes with expression levels higher than 8 all aligned with the mouse genome. Affy_17241598 aligns to a sequence on chromosome 10 that is a predicted protein with no further annotation. Affy_17414264 aligns with a sequence on chromosome 4 that is a non-protein coding gene or gene fragment. Affy_17527409 aligns with a sequence on chromosome 9 that is a predicted protein. When the role of these transcripts in the network associated with Rpe65 is considered, these data are far from informative. Unfortunately, for many of the probes on the Affymetrix Mouse Gene 2.0 ST Array, this is the current state of annotation. We are beginning to improve the annotation, and the identification of probes associated with microRNAs is due to the efforts of members of the Rob Williams group. With time, we believe that the annotation will improve allowing investigators to include these probes in functional genetic networks.
One of the extended features of the Affymetrix MouseGene 2.0 ST Array is its extensive coverage of gene expression at the exon level, and these data are presented in the DoD CDMRP Retina Affymetrix Mouse Gene 2.0 ST (May 15) RMA Exon-Level Database. At the present time, we do not have specific bioinformatic tools that fully investigate the exon-level data set. However, this database can be used to identify genes that are differentially spliced in the DBA/2J mouse and the C57BL/6J mouse. If an exon is expressed in one strain of mouse and not the other strain, the exon will have a large and significant likelihood ratio statistic (LRS) score across the BXD RI strain set. Basically, that individual exon will function like a Mendelian trait being either highly expressed or expressed at a low level. Therefore, to begin the analysis, we identified the exons with LRS scores higher than 60. We identified 2,314 exons, and the highest LRS score was 250. Then we reasoned that if an exon had a significant LRS score and at the gene level there was not a high LRS score, then the selected exon(s) was behaving differently from the other exons within the gene. Of the 2,314 exons with an LRS score above 60, 1,569 exons were part of a gene that did not have a high LRS score. An extensive evaluation of all these exons is beyond the scope of the present paper. Therefore, the top ten exons with LRS scores ranging from 165 to 202 were selected for further analysis. These exons were from ten genes: Cyb5r3, Hmgn2, Kif22, Col18a1, Uba2, Wdtc1, Haus5, Sdc2, Poc5, and Cntn1. In every case, at least one exon was differentially expressed between the C57BL/6J mouse and the DBA/2J mouse. To illustrate the differential splicing seen in these genes, we examined Col18a1 in depth. In the exon data set, there are 26 separate probes for the exons in the Col18a1 gene (see Table 3). Most of the probes have an associated LRS score ranging from 7 to 19. However, one probe (17242250) had an LRS score of 176.3. When we examined the distribution of the expression of this exon across the BXD RI strains, we saw that it is highly expressed in the C57BL/6J mouse retina relative to the DBA/2J retina (Figure 2). In other exon probes for Col18a1, there is a similar level of expression between the C57BL/6J mouse retina and the DBA/2J mouse retina, as well as the remaining BXD strains (Figure 3). It is possible that these differences in probe binding could be due to sequence variants between the C57BL/6J mouse and the DBA/2J. To test this hypothesis, we examined the sequence of the exons with high LRS scores to define the sequence differences between the two strains. Of the ten exons examined, five had single nucleotide polymorphisms (SNPs) within the region recognized by the Affymetrix probe. For these five exons, the difference in expression could be explained by differences in probe binding and not by differential splicing. For the remaining five exons, including that of Col18a1, the sequence in the C57BL/6J mouse was identical to that of the DBA/2J mouse. This type of analysis appears to identify differentially spliced genes in the retina of the BXD RI strain set. Our group is in the process of developing bioinformatic tools to take full advantage of the data from the Affymetrix exon chips. In the near future, this type of analysis may be as simple as a single query on the Trait Data and Analysis page of GeneNetwork.
To illustrate the features of the new DoD CDMRP Retina Database, we chose a specific gene, Robo2 (roundabout homolog 2) and used it to demonstrate the analytical powers of the database and the bioinformatics tools associated with GeneNetwork. Robo2 is highly expressed in the retina with a mean value of 10.7 across the BXD strain set. The expression within individual strains varies from a low of 10.2 to a high of 11.1. This is a log2 scale and represents approximately a twofold difference in expression (Figure 4). When we examined the database for genes with a similar pattern of expression across the BXD strain set, we found a group of genes that are highly correlated with the expression pattern of Robo2 (Table 4). One example is the third correlate on the list, Ncam2 (Figure 5) with a value of 0.926. Even the 100th correlate on the list (Git1) has a high correlation (r=0.873) with Robo2 (see Supplemental Table 1).
To define the regions of the genome that modulate the expression of Robo2, we plotted a genome-wide scan for Robo2 (Figure 6). This plot defines regions of the genome that correlate with the level of Robo2 expression, a quantitative trait locus (QTL). In this interval map, there is one significant QTL on chromosome 16 (notice the peak reaches the red line on the scan, p=0.05), and there are two suggestive peaks on chromosome 1 and chromosome 17 (above the gray line, p=0.63). The expression of Robo2 is modulated by genomic elements on chromosome 16. Two types of elements could affect the expression of Robo2: a cis-QTL or a gene with a nonsynonymous SNP. When we examined the significant QTL on chromosome 16 (21–27 Mb), we found there were no significant cis-QTLs at the gene level. With the DoD CDMRP Retina Database, it is now possible to look at the individual probes in exons and introns. When we checked the DoD CDMRP Retina Exon Level Database, we found one probe (Affy_17329472) that lies within the Leprel1 gene. When we checked the location of the probe with the Verify function on GeneNetwork, the probe lies in an intron and may be a non-coding RNA. However, when we examined the RNA-seq data from GeneNetwork, it appeared that this probe was detected in an RNA-seq analysis of the hippocampus and thus may be part of Leprel1 gene itself. Nonetheless, this probe marks a candidate for modulating the expression of Robo2. The second approach was to examine this region for nonsynonymous SNPs. Using the SNP browser in GeneNetwork, we looked at chromosome 16 (21–27 Mb) and found four known genes with nonsynonymous SNPs: Kng2, Kng1, BC106179, and Masp1. This analysis provided us with five candidates for modulating the expression of Robo2.
To determine whether this highly correlated set of genes in the Robo2 network have functional relationship(s), we used WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) to examine the top 500 correlates of Robo2 to determine if there were specific functional transcript enrichments. The list of the top 500 correlates of Robo2 was enriched for several biologic processes (nervous system development, synaptic transmission, and neuron differentiations); molecular functions (enzyme binding, post synaptic density [PDZ] domain binding, inorganic cation transmembrane transporter, and metal ion transmembrane transporter activity); and cellular components (cell projection part, neuron projection, intracellular part, and axon genes). This type of analysis plays a critical role in many genetic networks, defining the functional role of the network. In this specific case, the analysis indicates that the Robo2 network is involved in axonal growth and neuronal development.
This article announces the release of two new BXD retina databases on GeneNetwork. The first is at the gene level (DoD CDMRP Retina Affymetrix Mouse Gene 2.0 ST (May 15) RMA Gene-Level Database). The second data set is an exon-level analysis of the data presented in the first data set (DoD CDMRP Retina Affymetrix Mouse Gene 2.0 ST (May 15) RMA Exon-Level Database). Here, we emphasize some of the special aspects of these two data sets, including exon-level analysis and the inclusion of microRNAs and many non-coding RNAs. To illustrate many of these new features, we presented three approaches for analysis using the data sets.
The first was an examination of a cell signature within the data set. Within the DoD CDMRP Retina Database, there is a pronounced RPE signature. Some strains demonstrate low levels of expression of RPE65 while other strains have more than 16-fold higher levels of expression. This difference could not be due to differences in expression within the RPE. We believe that this difference is due to differences in the time of day when the retinas were isolated. The retinal samples were collected at two locations. At the University of Tennessee, the samples were usually isolated starting at 10:00 a.m., and lights were turned on in the animal colony at 6:00 a.m. Thus, the retinas were isolated at least 4 h after the lights came on. At Emory University, the retinas were isolated starting at 9:00 a.m., and lights on occurred at 7:00 a.m. (2 h after the lights came on in the animal colony). These differences in the time of day the retinas were isolated may be related to the number of RPE cells that adhered to the retinal samples [25].
Several bioinformatics tools are available to the vision research community. These tools include the NEI Bank project, which provides transcriptome profiling of the tissues of the eye, including mouse and human [26]. The Cepko group at Harvard, Boston, MA has provided the mouse retina serial analysis of gene expression (SAGE) library that includes gene expression of the embryonic and postnatal retina [27,28]. Daiger and his group at the University of Texas Health Science Center, Houston TX have lists of mapped loci and cloned genes associated with inherited retinal disease on the RetNet website. The Gene Expression Nervous System Atlas (GENSAT) now has a section devoted to the Retina Project [29] at GENSAT. The cell-specific labeling in the retina for different genes was illustrated using BAC transgenic mice [30]. The pattern of labeling in the retina defines the retinal cell types that express specific genes. This cellular localization aids in defining the localization of genetic networks in the retina. Finally, we posted the data from the study of glaucoma by Howell et al. [31] on the GeneNetwork website under the BXD eye database. These data are helpful in understanding the role of specific genetic networks in glaucoma (for example, see Templeton et al. [32]).
In conclusion, the DoD CDMRP Retina Database offered on GeneNetwork is a new resource for the vision community, in the ever-expanding variety of bioinformatics tools available. Previously, we offered several BXD microarray databases on GeneNetwork to the vision science community: the transcriptome of the whole eye (Eye M430v2 (September 2008) RMA Database) described in detail by Geisert et al. [18], a normal retina database (Normal Retina (April 2010) RankInv Database) described in detail by Freeman et al. [19], and the retina 2 days after optic nerve crush (ONC Retina (April 2012) RankInv Database) described by Templeton et al. [33]. This new data set offers a unique look at expression at the exon level. In addition, many non-protein coding transcripts are represented in the data set. The bioinformatics tools offered on GeneNetwork and these new databases are a unique resource for the vision research community.
This work was supported by DoD CDMRP Grant W81XWH1210255 from the USA Army Medical Research & Materiel Command and the Telemedicine and Advanced Technology (EEG), NIH Grant R01EY017841 (EEG), Vision Core Grant P30EY006360 (P. Michael Iuvone), and Unrestricted Funds from Research to Prevent Blindness (Emory University). We thank XiangDi Wang and Arthur Centeno for their technical assistance in this project. This study was supported in part by the Emory Integrated Genomics Core (EIGC), which is subsidized by the Emory University School of Medicine and is one of the Emory Integrated Core Facilities.