Molecular Vision 2015; 21:61-87
Received 26 September 2014 | Accepted 16 January 2015 | Published 20 January 2015
Fiona C. Mansergh, Matthew Carrigan, Karsten Hokamp, G. Jane Farrar
Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland
Correspondence to: Fiona C. Mansergh, Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland, Phone: 00353 1 8962164; FAX: 00353 1 8963848; email: firstname.lastname@example.org
Purpose: Retinitis pigmentosa (RP) typically results from individual mutations in any one of >70 genes that cause rod photoreceptor cells to degenerate prematurely, eventually resulting in blindness. Gene therapies targeting individual RP genes have shown efficacy at clinical trial; however, these therapies require the surviving photoreceptor cells to be viable and functional, and may be economically feasible for only the more commonly mutated genes. An alternative potential treatment strategy, particularly for late stage disease, may involve stem cell transplants into the photoreceptor layer of the retina. Rod progenitors from postnatal mouse retinas can be transplanted and can form photoreceptors in recipient adult retinas; optimal numbers of transplantable cells are obtained from postnatal day 3–5 (P3–5) retinas. These cells can also be expanded in culture; however, this results in the loss of photoreceptor potential. Gene expression differences between postnatal retinas, cultured retinal progenitor cells (RPCs), and rod photoreceptor precursors were investigated to identify gene expression patterns involved in the specification of rod photoreceptors.
Methods: Microarrays were used to investigate differences in gene expression between cultured RPCs that have lost photoreceptor potential, P1 retinas, and fresh P5 retinas that contain significant numbers of transplantable photoreceptors. Additionally, fluorescence-activated cell sorting (FACS) sorted Rho-eGFP-expressing rod photoreceptor precursors were compared with Rho-eGFP-negative cells from the same P5 retinas. Differential expression was confirmed with quantitative polymerase chain reaction (q-PCR).
Results: Analysis of the microarray data sets, including the use of t-distributed stochastic neighbor embedding (t-SNE) to identify expression pattern neighbors of key photoreceptor specific genes, resulted in the identification of 636 genes differentially regulated during rod specification. Forty-four of these genes when mutated have previously been found to cause retinal disease. Although gene function in other tissues may be known, the retinal function of approximately 61% of the gene list is as yet undetermined. Many of these genes’ promoters contain binding sites for the key photoreceptor transcription factors Crx and Nr2e3; moreover, the genomic clustering of differentially regulated genes appears to be non-random.
Conclusions: This study aids in understanding gene expression differences between rod photoreceptor progenitors versus cultured RPCs that have lost photoreceptor potential. The results provide insights into rod photoreceptor development and should expedite the development of cell-based treatments for RP. Furthermore, the data set includes a large number of retinopathy genes; less-well-characterized genes within this data set are a resource for those seeking to identify novel retinopathy genes in patients with RP (GEO accession: GSE59201).
Inherited retinal degeneration (RD) damages light-sensing photoreceptors, resulting in loss of vision. RD is an umbrella term covering diseases that range in severity from Leber hereditary amaurosis (LCA), in which vision is severely affected from birth, to milder forms of RP that can leave central vision intact until old age, and macular degeneration, which affects central vision only. RD represents the most common cause of visual dysfunction in people of working age in developed economies. Therefore, these conditions have significant adverse effects on patient quality of life and on the broader economy. Genetic linkage studies, functional genomics, and next-generation sequencing (NGS) technologies have provided opportunities to elucidate the pathways of retinal development and homeostasis, in addition to the molecular pathogenesis of disease. A notable feature of inherited retinopathies is their genetic heterogeneity; multiple mutations in any one of more than 200 genes can cause RD. Retinitis pigmentosa (RP), a form of RD that results from rod photoreceptor defects, affects approximately 1 in 3,000 adults and can result from mutations in any one of more than70 genes [1-3] (RetNet). Regardless of the genetic mutation underlying individual forms of RD, however, photoreceptor cell death occurs gradually via apoptosis.
Gene therapies for some forms of RD have been investigated; gene replacement therapies for diverse recessive retinopathies have shown promise in clinical trials [1,3-11]. With the exception of optogenetic-based gene therapies [1,12], gene therapy requires significant photoreceptor survival and therefore may not represent an optimal therapeutic strategy for advanced disease. Furthermore, gene therapies targeted to the primary genetic defect underlying each form of RD potentially will require approximately 200 individual therapeutics due to the significant genetic heterogeneity in RD. Therefore, gene therapies for rarer forms of RD may be challenging, given timelines and costs associated with clinical trials.
Transplantation of stem cell–derived photoreceptor precursors into retinas with advanced disease may provide a complementary approach. A significant advantage of cell therapy should be the potential to treat diverse and advanced forms of RD, irrespective of genetic etiology or stage of degeneration. However, for stem cell approaches to be effective, expandable sources of cells that can be routinely differentiated into the required cell types are required. The most abundant source of transplantable photoreceptors is currently the postnatal mouse retina. Freshly dissected early postnatal mouse retinal cells, when dissociated and transplanted into recipient eyes immediately, can integrate into host retinas and form functional photoreceptors [13-18]. The same cells can be expanded in culture, but cultured retinal progenitor cells (RPCs) switch off rhodopsin in the first 3–7 days of culture at the protein and RNA levels. Moreover, cultured RPCs do not form photoreceptors after transplantation; these cells are probably glial precursors [15,19]. Recent studies have indicated oligodendrocytic potential in RPCs, which is not present in native retinal cells. Growth factors commonly used in serum-free neural stem cell culture, epidermal growth factor (EGF) and fibroblast growth factor 2 (FGF2), may influence RPC potential; EGF in particular is known to promote gliogenesis .
The human developmental stage equivalent to early postnatal mouse retinas is a mid-gestation embryo, which makes donor transplantation ethically and logistically difficult . Advances have been made in differentiating transplantable photoreceptors from embryonic stem (ES) cells [21-25], using three-dimensional embryoid body–based culture techniques that produce laminated optic cups with differentiating photoreceptors. ES cell use presents ethical barriers, which in principle can be overcome using induced pluripotent stem (iPS) cells that are similarly capable of photoreceptor differentiation [23,26,27]. However, the number of cells obtained from current ES and iPS cell differentiation protocols is still small. Protocol development may be enhanced by more detailed knowledge of gene expression cascades during photoreceptor development.
This study investigates differences in gene expression profiles between postnatal retinas, cultured RPCs, and rod photoreceptor precursors to identify genes involved in rod photoreceptor development and maturation. Genes upregulated during rod development have been identified via microarray, comparing FACS-purified enhanced green fluorescent protein (eGFP0-positive rod cells from postnatal day 5 (P5) Rho-eGFP retinas to eGFP-negative cells from the remainder of the retina. Comparison of whole P1 and P5 retinas to cultured RPCs (which have lost the ability to become photoreceptors) was also undertaken. Use of t-distributed stochastic neighbor embedding (t-SNE) [28,29] allowed identification of closest expression “neighbors” to a subset of well-characterized genes with key photoreceptor functions. We have identified 636 genes that are potentially important in rod photoreceptor development and differentiation. Forty-four are previously identified RD disease genes (RetNet), while 207 have been highlighted as important to retinal and/or photoreceptor development in other retinal gene expression studies [16,30-42]. The retinal function of 61% of the gene list is unknown. These results represent a resource for elucidating rod photoreceptor developmental biology. The data set also contains a large number of genes known to cause retinopathy; less-well-characterized genes within this data set represent a resource for those seeking to identify novel retinopathy genes in patients with RP.
Animals were housed and cared for under license from the Irish Medicines Board/Department of Health and Children according to Irish and European Union (EU) legislation. Animals were housed and cared for under license from the Irish Medicines Board/Department of Health and Children according to Irish and European Union (EU) legislation. This legal framework is equivalent to guidelines published by the Institute for Laboratory Animal Research or the U.S. Public Health Service. Animal care adhered to the ARVO Statement for Use of Animals in Research. This legal framework is equivalent to guidelines published by the Institute for Laboratory Animal Research or the U.S. Public Health Service.
Retinas were extracted from postnatal day 1 and day 5 (P1 and P5) rhodopsin-eGFP (Rho-eGFP ) heterozygote mice, maintained on a 129Sv/J background. These mice express a human Rho-eGFP fusion protein that is visible in rod outer segments following transplantation. Heterozygotes were used as homozygotes show symptoms of retinal degeneration . Rho-eGFP retinas were also used as a source of cultured RPCs, to minimize possible gene expression differences arising from varying genetic background.
Rho-eGFP P5 retinas were dissected and placed in 1 ml Hank’s Balanced Salt Solution (HBSS; Lonza Biologics Ltd., Tewkesbury, UK). The ciliary margin was removed from the retinas before dissociation. Retinas were placed in 1 ml HBSS (Lonza), and 100 μl of 10 mg/ml trypsin (Sigma-Aldrich, Arklow, Ireland) was added. Retinas were incubated for 10 min at 37 °C, after 5 min, 10 μl of 10 μg/ml DNase1 + 100 μl 20 mg/ml trypsin inhibitor were added, and the samples were triturated. Cells were spun for 5 min at 350 ×g, resuspended in 1 ml PBS (Dulbecco’s Phosphate Buffered Saline, -CaCl2, -MgCl2, Catalogue number: 14190-094, Life Technologies, Paisley, UK) with 10 μl DNase1, and then analyzed with FACS. Cells were sorted based on GFP expression on an Aria Fusion cell sorter (Becton Dickinson, Oxford, UK) [15,44]. Following FACS analysis, cells were spun at 350 ×g for 5 min and resuspended in 100 μl PBS. Cells were then subjected to RNA extraction.
Rho-eGFP P4–5 retinas were placed in 1 ml PBS or HBSS following dissection. The ciliary margin was removed from the retinas before dissociation to avoid contamination by ciliary epithelial (CE) cells in subsequent cultures. Two hundred microliters of Accutase (Sigma-Aldrich) was added, the retinas were incubated for 5 min at 37 °C, allowed to settle to the bottom of the tube, and most of the supernatant aspirated away, 1 ml growth medium (Dulbecco’s modified Eagle’s medium nutrient mixture, F12 Ham, D8437, Arklow medium [DME/F12], (Sigma), supplemented with B27 without vitamin A (Invitrogen), 1× L-glutamine (Lonza), 1× penicillin/streptomycin (Lonza), 5 μg/ml heparin (Sigma), 20 ng/ml FGF2 (Peprotech, London, UK), and 20 ng/ml EGF (Peprotech) were added, and the retinas were dissociated by trituration with a fire-polished Pasteur pipette. Following dissociation, cells were counted four times using a hemocytometer and plated for culture.
RPCs were grown in T25 flasks (Sarstedt, Drinagh, Wexford, Ireland) in 5 ml growth medium and incubated at 37 °C and 5% CO2. After initial plating, the medium was replaced after 5–7 days initially, and then every 2–7 days depending on cell density. Initial platings contained 1/1,000 gentamycin, which was not added to the medium after 7 days of culture. Cells were passaged when 80–90% confluent. About 0.5 ml 0.25% trypsin/EDTA (Lonza) or Accutase (Sigma-Aldrich) were added to each T25 flask, following the removal of medium and rinsing with 1× PBS. Flasks were incubated for 5–10 min until the cell monolayers lifted off. Cells were resuspended in 3 ml DMEM/F12, counted four times, and spun at 100 g. The supernatant was removed, and the pellets were resuspended in growth medium, replated, frozen, or treated with TRI reagent (Sigma-Aldrich). Early passage cells (P2–P5) were used for the array. Previous data  indicate that P0 cells plated for 7 days or longer have completely lost Rho-eGFP protein expression; cells of passage 2 or greater have largely lost all RNA expression of rhodopsin as well (no doubt reflecting the loss of post-mitotic cell types from the cultures). All cultures used for the array were morphologically similar and had attained the maximum growth rate at the point of harvest, thus minimizing variation in gene expression. Cells older than P9–10 are increasingly likely to have reached a Hayflick limit and/or to have become spontaneously immortalized, and were therefore excluded from the study. Cells older than P9–10 are increasingly likely to have reached a Hayflick limit and/or to have become spontaneously immortalized, and were therefore excluded from the study. The Hayflick limit is the point at which, after a certain number of cell divisions, normal somatic cells stop dividing, in association with critical levels of telomere shortening.
Cell pellets were resuspended in 1 ml TRI reagent (Sigma-Aldrich) and triturated using a P1000 pipetter, while the retinas were homogenized in 1 ml TRI reagent using a 100 μl mini Dounce homogenizer (Fisher Scientific, Dublin, Ireland). RNA was prepared according to the manufacturer’s protocol, and then repurified using the RNeasy kit (Qiagen, Manchester, UK), incorporating the DNase step according to the manufacturer’s instructions. The RNA samples were assayed for concentration and quality using a Nanodrop ND1000 (NanoDrop Technologies, Wilmington, DE) spectrophotometer and a Qubit (Invitrogen, Paisley, UK). One to fifty dilutions of FACS-sorted samples were assessed with qPCR (see below) for enrichment of rhodopsin in the eGFP-positive samples.
RNAs passing quality control (OD 260/280, OD 230) were outsourced to Source BioScience GmbH (Berlin, Germany) for the array. The samples were reanalyzed for quality using a BioAnalyser 2100 (Agilent G2938A, Santa Clara, CA). The FACS-derived samples were amplified before labeling to increase the amount of input RNA, using the Ovation Pico WTA system (NuGen) and then labeled using the Encore Biotin module (NuGen, Leek, the Netherlands). Unamplified samples were labeled using the IVT Express kit (Affymetrix, High Wycombe, UK), according to the manufacturer’s protocol. Amplified RNA was purified and hybridization cocktails prepared and hybridized to Affymetrix mouse A430_2 whole genome arrays, using the GeneChip Hybridization, Wash, and Stain Kit (Affymetrix), according to the manufacturer’s protocol. Arrays were scanned using a GeneChip Scanner 3000 (Affymetrix). Four biological replicates of each sample type were arrayed.
Dimensionality reduction was performed using the Barnes-Hut-SNE implementation  of the t-SNE algorithm . Each probe was represented as a 20-dimensional vector, with each vector component representing its normalized expression
in a single sample and condition. The set of all probe vectors was passed as input to the Barnes-Hut-SNE algorithm, which
generated a two-dimensional output mapping designed to preserve the local and global structure from the 20-dimensional data
set (inclusive of all replicates for all sample types). Due to the large size of the output and consequent difficulty finding
individual genes by manual inspection or traditional image-editing tools, a web interface was programmed in Python that allows
searches for individual genes and identifies nearest expression neighbors. The resulting t-SNE expression map for each gene is searchable
Initial analysis of Rho-eGFP-positive cells (immature rod photoreceptors) versus Rho-eGFP-negative cells was performed. More than 1,000 genes were found to be upregulated in the Rho-eGFP-positive cells, and another similarly sized cohort of genes was found to be downregulated, with fold changes of less than −2 or greater than 2, and p values of 0.01 or less (Appendix 2). Following the initial analysis, differentially regulated genes (adjusted p value <0.001, fold change >2 or < −2) from two-way comparisons of the five data sets (RPCs, P1 retinas, P5 retinas, Rho-eGFP +ve, and Rho-eGFP –ve retinal cells) were sorted. Genes differentially regulated in at least three comparisons were retained for bioinformatic analysis. RNA amplification can introduce significant error ; comparison of the FACS eGFP+ve versus eGFP-ve gene list with those from unamplified retinas or tissue culture cells increases the likelihood of identifying robust changes. Initial bioinformatic analysis was performed using the Gene, Online Mendelian Inheritance in Man (OMIM) and PubMed databases (NCBI), and the RetNet database (RetNet). Gene lists were submitted to the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 [51,52] to identify functional categories that were overrepresented in each gene list. Functional categories that gave a Benjamini-Hochberg procedure corrected p value of <0.01 are presented as significantly overrepresented in each of the gene lists compared to what would be expected from the same number of randomly selected genes by chance.
Gene positions were translated into ranks along the chromosomes to allow to detect clustering without bias from varying intergenic distances. The differences in ranks were calculated for the genes located on each chromosome and averaged using the median. The medians were compared against the median averages of 1,000 randomly selected sets of genes, with the gene sets the same size as the up- and downregulated lists. Box plots were generated in R to visually compare the average distance of the genes from the two lists with the average distances from the random lists . One sample t test was performed in R to check for significance in the differences between the mean of the random averages and the averages obtained from the retinal gene lists.
The Affymetrix probe IDs were taken from the lists of genes upregulated or downregulated in the Rho-eGFP FACS +ve versus Rho-eGFP –ve samples. The probe list was matched via the Biomart to Ensembl transcripts. These transcript IDs were used to extract the 2,000 bp upstream region of the transcription start site (TSS) via Biomart. With a tool called find_pssm_dna provided by the EBI, these sequences were searched for the Crx and Nr2e3 binding sites, which were retrieved as positional frequency matrices from Jaspar . A probability of p<0.001 against an empirical estimate of nucleotide frequency background distribution  for 2,000 bp upstream of the TSS in mouse was used to refine the results. The p value of <0.001 is related to the low probability of these nucleotides occurring by chance. For each gene listed, the top-scoring site from all of the transcripts was extracted to an Excel spreadsheet (Appendix 3). A combined score for both transcription factors (TFs) was calculated by simple adding the top scores from each TF (an exact match to the consensus sequence gives a score of 100%). Using the composite 75% score as the cutoff identified 365 upregulated genes and 385 downregulated genes as likely to be regulated by Crx and Nr2e3 (supplementary data file 3). The cutoff value of 75% was chosen since the recommended value of 80% resulted in the loss of rhodopsin (known to be regulated by Crx and Nr2e3) from the data set and was therefore considered overly stringent.
All data have been submitted to GEO () in accordance with MIAMI standards [56,57]. The GEO accession number for this data set is GSE59201. Finally, the listed genes of interest from this study were compared with those obtained from other relevant retinal gene expression studies [16,32-42].
Primers were designed to amplify fragments of 120–250 bp, with shorter fragments preferred. Introns were included in amplified segments where possible, to prevent amplification of genomic DNA. Primers were designed using PrimerBlast with an optimal Tm of over 60 °C (the temperature of the amplification step). RNAs were diluted 1/100 fold for the RPC and whole retinal samples, and 1/50–1/25 fold for the FACS-derived RNA samples (depending on the starting concentrations). Q-PCRs were performed using RNA with the two-step QuantiTect kit (Qiagen) as described previously [15,58]. Q-PCRs were assembled on 96 well plates (Applied Biosystems, Paisley, UK) and run on 7300 and 7500 series q-PCR machines (Applied Biosystems). The ΔΔCt method was used to calculate fold changes, using beta-actin as a housekeeping standard and the P1 retina as the calibrator. The q-PCRs of the retinopathy genes (Crx, Nrl, Nr2e3, Rho) were included in analysis to check that our qPCRs accurately represented their known expression patterns. Primer sequences are noted in Table 1. The samples were run in triplicate, and at least three biologic replicates were performed, giving total replicates of at least nine. Adult retinal samples were also included in q-PCRs as positive controls; water blanks and no-RT controls were also run. Error bars were calculated via standard error of the mean (SEM). Results were also analyzed using Student t tests.
Transplantable photoreceptor precursors, which peak in transplantation efficiency from P3–6, are post-mitotic [15-18,20]. To assay gene expression changes leading to photoreceptor differentiation, microarray analysis was used, with comparisons of P5 retinal Rho-eGFP-positive and -negative cells, whole retinal samples at P1 and P5, and cultured RPCs, which have lost the potential to become photoreceptors [15,20]. Rho-eGFP-positive cells represent rod photoreceptor precursors, while Rho-eGFP-negative cells represent other differentiated neuroretinal cell types and any remaining unspecified cells. Up- and downregulated gene lists (Appendix 2) were derived from the Rho-eGFP-positive and -negative array data, giving a comprehensive list of genes significantly differentially regulated in recently specified rod precursors. Given the potential error associated with RNA amplification , this gene list was compared with lists derived from two-way comparisons of array data from unamplified RPCs, P1 and P5 retinal RNA. Genes differentially regulated in at least three out of four comparisons (Rho-eGFP positive versus Rho-eGFP negative, RPCs versus P1 retinas, RPCs versus P5 retinas, P1 versus P5 retinas) were prioritized (Table 2, Table 3, Table 4, Table 5). Notably, these data represent a unique comparison of retinal samples around the time of rod genesis with cultured RPCs, which have lost photoreceptor potential [15-18,20]. However, FACS-purified Nrl-eGFP, Crx-eGFP, and Rho-eGFP cells have previously been arrayed [16,33,34], to identify cell surface proteins suitable for antibody-based FACS of non-genetically modified cells, or to establish cone precursor expression patterns.
A shortlist of genes differentially regulated in at least three out of four comparisons was generated. These genes were divided into categories: 228 genes were upregulated in eGFP+ve samples, and differentially regulated in at least two other comparisons (Table 2), and 235 genes were downregulated in eGFP+ve samples, and differentially regulated in at least two other comparisons (Table 3). A further 92 genes were differentially regulated in all comparisons of RPCs and P1 and P5 retinas, but showed no differences between the two FACS-sorted samples, implying even expression across the neural retina at P5 (Table 4). Thirty-six retinal disease genes were contained in the upregulated gene set, while three appeared in the downregulated or differentially regulated gene lists. Disease genes listed in RetNet are highlighted in bold italics (Table 2, Table 3, Table 4, Table 5). Searches of the Gene, OMIM, and PubMed databases for each gene revealed that 61% of the total gene list remains unconnected with the retinal literature; a retinal function for 39% was noted.
To identify genes with expression patterns similar to those of known importance in the developing retina (e.g., Nrl, Crx, Rho), a further analysis method was deployed. Replicates for all genes represented on the array were used to generate a t-SNE
probe plot, which
Retinal expression patterns indicated by the array were investigated in 42 genes via qPCR. Genes selected for qPCR included those with well-characterized roles in retinal development and disease (Rho, Nrl, Crx, Nr2e3), characterized genes with as yet unknown retinal functions, and relatively uncharacterized ESTs. The qPCR gene list included upregulated, downregulated, and differentially regulated genes and a number from the t-SNE neighbors list (Table 2, Table 3, Table 4, Table 5). The direction of expression change was confirmed in 94% of cases where unamplified RNA was used (P1 and P5 retinas, cultured RPCs), and in 90% of cases where amplified RNA was arrayed (Rho-eGFP-positive and -negative FACS samples), suggesting that the data set is robust. Notably, the amplified FACS samples showed a lower validation rate than the unamplified samples, thus supporting the approach of comparing the FACS-derived gene lists with unamplified retinal samples for gene prioritization. Representative q-PCR results are shown in Figure 2; complete q-PCR results are given in Appendix 4.
The genes listed in Table 2, Table 3, Table 4, Table 5 were analyzed for known function, known disease-causing mutations in humans and model organisms, and possible retinal functions, using the Gene, OMIM, and PubMed databases on NCBI and RetNet. Forty-four of the genes identified were listed in RetNet as causing a retinopathy, the majority of which were present in the upregulated and neighbors data sets (Table 2, Table 3, Table 4, Table 5). Sixty-one percent of the genes in our data set have no known retinal function as assessed with a PubMed search; these genes present a significant resource for RD gene discovery, and for the understanding of retinal development. Further analysis was undertaken using DAVID [51,52] to identify overrepresented functional categories within our gene lists (see Figure 3). The upregulated and neighbors gene sets showed overrepresentation of a significant number of categories related to visual perception, and photoreceptor differentiation. Notably, genes relating to cilia were also overrepresented, suggesting that upregulation of ciliary genes occurs during the time period under investigation. Genes involved in cell projection and neuronal differentiation were also upregulated. The most significant categories in the downregulated list included developmental genes, transcriptional regulators, those involved in neurogenesis, axon guidance, cell-cell adhesion, cell motion, regulation of cell proliferation, and signal transduction. Genes differentially regulated between RPCs and whole retinas at P1 and P5 (but unchanged between Rho-eGFP-positive and -negative cells) show a different functional overrepresentation list, methylation, acetylation, chromosomal DNA binding, mitochondrial biology and cell adhesion, possibly reflecting differences between actively dividing precursors in culture (RPCs) and neurons gradually ceasing cell division, then maturing and generating greater connectivity between three-dimensional retinal layers.
Promoter regions of genes upregulated in rod precursors were analyzed for common transcription factor motifs, using 2 kb immediately upstream of the TSS. Results indicate that motifs for 116 transcription factors are overrepresented in the data set, including Nr2e3 (Gene ID: 23958, OMIM: 604485; a nuclear receptor and rod photoreceptor transcription factor), Pax6 (Gene ID: 18508, OMIM: 607108) and Sox2 (Gene ID: 20674, OMIM: 184429; genes involved in eye specification that can cause anophthalmia and microphthalmia when mutated). A further study was undertaken, searching for the consensus binding sites for the photoreceptor transcription factors Crx and Nr2e3 in the 2 kb region upstream of the TSS of all genes either up- or downregulated in FACS-sorted photoreceptors. Crx controls expression in both rods and cones, while Nr2e3 is rod specific. These two transcription factors were chosen because their consensus binding sites were most amenable to analysis. Five hundred thirty-eight upregulated genes and 582 downregulated genes were identified as containing 75% of the maximum score for presence of Nr2e3 (rod-specific) and Crx (photoreceptor-specific) transcription factor binding sites in the 2 kb immediately upstream of the TSS (p<0.001 for each binding site, using the lower 75% limit suggested by oPOSSUM v2.0  since the default 80% cutoff resulted in the loss of Rho and other genes known to be regulated by these transcription factors). This list represents genes more likely to be regulated by Crx and Nr2e3 (Appendix 3); those upregulated in Rho-eGFP-positive cells are likely to be preferentially expressed in rod photoreceptors. This data set was compared with a previously published study of genes dysregulated in Crx, Nr2e3, and Nrl null mice and containing Crx binding sites as determined with CHIP-seq. The prior study identified 6,000 genes with possible Crx binding sites, of which 734 were dysregulated in various retinal mutant mouse lines . A comparison of that data set with this one shows an overlap of 68 genes in the upregulated list and 25 in the downregulated list , an overlap that comprises 13% of their data set and 8% of ours.
The genomic locations of the genes in the FACS upregulated and FACS downregulated data sets (Appendix 2) were examined, to determine if there was any evidence for gene clustering in specific areas of the genome, suggesting some form of coordinated regulation, or whether the expression control mechanisms are coordinated irrespective of location. The median distance between neighboring genes on the same chromosome was calculated for the current data sets, and compared with the medians generated from 1,000 randomly generated data sets comprising the same numbers of genes. The results are provided in box plot format in Figure 4; the location of the genes in the up- and downregulated gene sets is outside the 95% confidence interval, suggesting non-random arrangement of genes in up- and downregulated gene sets. A one-sample t test yielded p values <2.2e-16 for the up- and downregulated lists. The 95% confidence intervals are Up: 22.43 to 22.53, Down: 19.1 to 19.18. In comparison, the eGFP-positive and -negative gene lists are at 17 and 15, respectively, clearly well outside the 95% confidence interval. These results are therefore unlikely to have occurred by chance and are probably the result of some degree of functional clustering or clustering as a result of ancestral gene duplication events.
Several previously published array and/or next-generation sequencing studies have been identified, aspects of which should theoretically overlap with the current study. These studies include developmental time courses using c57Bl6/J or Balb/C wild-type mouse retinas [30,31], genes that varied in expression between mouse strains during retinal development , expression profiles in the adult retina and in various adult retinal cell types [38,40], comparisons of the Otx2 null retina (which lacks photoreceptors)  and the RD1 retinal degeneration model, with wild-type , comparisons between Notch1 conditional knockouts, which overproduce rods, with wild-type retinas , comparisons of the “cone only” Nrl knockout retina with wild-type , CHIP-seq analysis of genes containing Crx binding sites , expression profiles of Rho-eGFP-positive and -negative retinal cells at P4 [16,42], comparisons of Nrl-eGFP-positive and -negative cells at P4  and Crx-eGFP-positive and -negative cells at E17.5 . The significantly differentially regulated gene lists from these papers were compared with those generated in this study to highlight genes identified in multiple experiments. Four hundred twenty-nine genes are uniquely identified in the current study, while 121 upregulated, 43 downregulated, 20 differentially regulated, and 99 neighbors genes have been identified in similar studies (due to the overlap between the neighbors data set and the others, the total number of genes in this study also identified elsewhere is 207, of which 73 were present in more than one other study). Retinal disease genes, especially those known to be rod specific, are highly represented in all studies with enriched populations of rod photoreceptors as would be anticipated. Interestingly, overlaps with prior adult retinal gene expression studies were also significant, given that this data set was generated using the developing retina. Overlaps are indicated (via reference number in red) in Table 2, Table 3, Table 4, and Table 5.
This study presents several novel and potentially important findings. We have identified a large number of differentially regulated genes that may be important in rod and photoreceptor development. The retinal functions of approximately 60% of these genes are uncharacterized (although some can be inferred from known functions in other tissues and organs). The comparison of cultured RPCs with the P5 retinas from which the RPCs are derived is novel and has been valuable in pinpointing why cultured cells lose photoreceptor potential. This study has indicated that immature rods and the remainder of retinal neurons show non-random genomic locations of expressed genes, a novel finding that offers important inferences regarding the developmental control of neuronal gene expression and the evolutionary history behind this. The presence of Crx and Nr2e3 sites in up- and downregulated genes was corroborated by a previous study  but also offers important clues to the complexities of developmental gene expression regulation in retinal neurons. In terms of techniques, the t-SNE approach is novel regarding retinal samples and has not been widely used in array research to date. Although less comprehensive in terms of identifying every differentially regulated gene, t-SNE is a great method for identifying genes with similar expression patterns to known disease genes and may be useful in any genetic disorder where some, but not all, causative genes have been identified. We have also attempted to ameliorate bias via a comparison of whole retinal/tissue culture samples (unamplified RNA) with amplified RNA from FACS sorted immature rods (Rho-eGFP positive) and remaining retinal neurons (Rho-eGFP negative), a technique that has worked well in terms of allowing prioritization of interesting novel genes.
Using two different methods of analysis, 636 genes were identified that are differentially regulated in retinal and rod photoreceptor development. In searching for genes differentially regulated in multiple comparisons, or that closely resembled known retinal genes, the process was extremely rigorous. Comparison between FACS Rho-eGFP-positive cells (immature rods) and Rho-eGFP-negative cells yielded 3,320 genes with an adjusted p value of <0.01 and a fold change of greater than 2 or less than −2 (Appendix 2). This list was used for promoter and location analysis; but for in-depth evaluation, a subset of genes that also show expression changes between RPCs and P1 and P5 retinas was chosen, to minimize the risk of error introduced by RNA amplification, and therefore to reduce the number of false positives under investigation. However, using the priority gene list alone may result in numbers of interesting genes being left out of analysis owing to differential regulation not being detectable in whole retinas or RPCs, a risk that may be particularly high with transcription factors or other genes that may be expressed transiently or at a low level. Therefore, the larger data set is publicly available here (Appendix 2), via GEO (accession GSE59201), and via our searchable t-SNE database, as a resource for the retinal community. A large number of genes were investigated with qPCR to test the accuracy of the results. These included 43 genes identified by the array, and two genes important to visual specification or retinal biology that were not identified by the array. We included four genes involved in initial visual patterning: Pax6, Sox2, Mitf (Gene ID: 17342, OMIM:156845) and Otx2 (Gene ID: 18424, OMIM: 600037). These are induced so early in eye development that they were theoretically unlikely to be detected, but in fact, only Mitf was not identified as differentially regulated in any categories. Another gene, arrestin (Arr3; Gene ID: 170735, OMIM: 301770), was represented on the array and was massively upregulated in rods via qPCR, but was not detected by the array, probably as a result of an ineffective probe. Overall, 11/13 “known” visual genes for which we had preexisting primer sets were detected by the array and confirmed with qPCR, suggesting a “false negative” rate of approximately 15%.
Regarding the inability to derive functional photoreceptors from RPC cultures, it is significant that a large number of retinal disease genes and their neighbors, including transcription factors vital to rod development, are significantly downregulated in RPCs, many to the point of being undetectable with RT–PCR . The 60% of genes with as yet uncharacterized retinal functions constitute a resource for further identification of genes that may be able to influence proportions of rods obtained from ES or iPS cell cultures, particularly those likely to function as growth or transcription factors. The growth of RPCs in Egf- and Fgf2-containing medium seems to be biasing development toward glial fates, including myelinating oligodendrocytes not normally found in the neural retina ; significant discovery of alterative mitogens may be required before RPCs can be persuaded to adopt neural fates after differentiation. However, Müller glia adopt their final differentiated status later than rods and are capable of limited cell division even in the adult retina. Postnatal retinas may contain neural precursors that are too terminally differentiated to offer an expandable source of any neuronal cell type; RPCs may represent a glial population only.
Use of t-SNE as an adjunct search tool for genes of interest has thrown up some interesting findings. t-SNE allows visualization of the 20-dimensional (5× samples, 4× replicates per sample) gene expression data per probe in two dimensions. Mapping the global expression data from our data set with RetNet disease genes overlaid (Figure 1) showed some clusters of retinal specific disease gene expression. Genes mapping within these clusters are expected to show similar expression patterns and may be candidates for retinal disease or have important retinal functions, or both. In the context of this experimental paradigm, the neighbors approach preferentially identifies genes upregulated in rod photoreceptors, as the nearby clustered RD disease genes are also upregulated in rods. The clusters are located within the same 20% area of the overall plot, suggesting that they should show some degree of similarity to each other, in addition to similarities within the clusters (Figure 1). QPCRs of the genes derived from the neighbors data set all show upregulation of expression in Rho-eGFP-positive cells and in the P5 retina regarding the P1 retina and RPCs (Figure 2; see neighbor pairs Rho + A530058N18Rik, Crx + SamD11, Nrl + BC022687). Notably, A530058N18Rik and SamD11 are uncharacterized genes with no retinal functions noted in PubMed to date.
The data set was used to interrogate whether genes upregulated (or downregulated) in rods, a homogenous neuronal cell type, possessed a non-random genomic location. In theory, coordinated regulation of such genes in the retina may involve positional clustering. Notably, 1,000 repeats of random assortments of gene groupings of the same size as our data sets gave median gene separation distances that were significantly different from the non-random array–based data sets generated in this study. For this analysis, the more extensive gene lists (those upregulated and downregulated in Rho-eGFP-positive rod precursors) were used. These data therefore provide some evidence for rod-specific gene expression being influenced by the genomic location of the genes, which are more tightly clustered than would be expected by chance (Figure 4). This is the first demonstration of such an effect in retinally expressed genes, but might be the expected result of evolutionary gene duplication and diversification of function as retinas became more complex.
We have generated up- and downregulated gene lists with a high probability of containing functional Nr2e3 and Crx binding sites. The identification of two different transcription factor binding sites, known to control gene expression in photoreceptors, in close proximity within 2 kb of the start site, represents significant evidence for regulation in photoreceptors. This list showed significant overlap with that obtained in a previous study, which identified greater than 6,000 genes containing Crx binding sites via CHIP-seq, of which 734 genes were dysregulated in Crx, Nr2e3, or Nrl null mice , and are therefore likely to be of functional significance. In the current study, an overlap of 93 mostly upregulated genes (which comprises 13% of their data set and 8% of the genes we identified as containing Crx and Nr2e3 binding sites) was identified. The differences between the studies may be partially explained by differing methodologies and by their use of Crx sites alone; Crx controls gene expression in cone and rod photoreceptors, not just rods, whereas Nr2e3 is rod specific. Given the presence of Crx binding sites in so many retinally expressed genes (>6,000), their findings suggest that the binding sites may be subject to complex regulation, including the negative regulation of some non-photoreceptor genes in photoreceptors . The finding from this study of Nr2e3 and Crx binding sites in genes downregulated in rod precursors therefore correlates and enhances those results . Alternatively, it may be that these binding sites are used for positive regulation at developmental stages not evaluated in either study.
Analysis of overrepresented functional categories within gene lists provided results that might have been anticipated. The upregulated and neighbors data sets showed the most significant overrepresentation of functions associated with vision, photoreceptor development and function, and ciliary function, suggesting elaboration of the photoreceptor cilium that connects the cell body with the outer segment is an important function of retinal progenitors at this stage. These data may pinpoint less-well-characterized genes in this data set as candidates for retinal ciliopathies. The downregulated data set shows overrepresentation of developmentally regulated genes, transcription factors and neuronal development, and axon guidance functions, possibly reflecting the fact that all other retinal neurons, at P5, have been specified for longer than rod photoreceptors , are post-mitotic, and are most likely pruning and optimizing connections with other parts of the retina at this point in time. Genes that vary between RPCs, and whole P1 and P5 retinas (without significant change between Rho-eGFP-positive and -negative retinal cells at P5) are evenly expressed across the whole neural retina at P5, but are up- or downregulated compared to an earlier developmental stage and compared to cultured RPCs, which are still dividing. This gene list shows overrepresentation of cell attachment, histone-related, acetylation, methylation, and mitochondrial functions. The retina at the points in time studied is in the process of specifying the chromatin structure to optimize gene expression in the configuration required by its maturing post-mitotic neuronal populations, while cultured RPCs are still dividing, lack organizational layers, and are comprised of glial progenitors. Notably, the retina requires more energy than almost any other tissue [1,58]. Speculatively, overrepresentation of mitochondrial genes might be related to retinal development, which is setting up to facilitate the high energy requirements of the mature retina, while RPCs in culture may have energy needs that are significantly lower.
Several other large-scale expression studies have been undertaken in retinas, both wild-type and genetically modified [16,30-42]. Overlaps with these studies total 207 genes (Table 2, Table 3, Table 4, Table 5). The most significant overlaps were noted with developmental studies, those where rod photoreceptors were enriched, and with the adult retina. Seventy-three genes were noted in two or more studies (these are noted in Table 2, Table 3, Table 4, Table 5; where two or more reference numbers are present beside the gene name). Many are key retinal disease genes (in italicized/bold font; Table 2, Table 3, Table 4, Table 5). Interestingly, some genes with relatively uncharacterized retinal functions were also noted in this category; the functions of a selected few are highlighted in Table 6. Retinal expression patterns indicated by the array were investigated in 42 genes via qPCR, ranging from known disease genes such as Rhodopsin, Nrl, Crx, and Nr2e3, to relatively uncharacterized ESTs. Eight of these genes showed high levels of upregulation in Rho-eGFP-positive cells (Figure 2), in addition to upregulation during retinal maturation and negligible expression in RPCs, all pointing to significant roles in rod function. Summaries of gene function for several of these genes are given in Table 7. In summary, many genes with uncharacterized retinal functions have been identified as important in retinal development by this study and others.
In addition to use in identifying novel retinal genes and in monitoring expression patterns during development, this data set may be of use in identifying novel retinopathy genes. To date, a primary screen using NGS has indicated that just less than 50% of Irish patients with RD have mutations in one of approximately 200 known retinopathy genes (M. Carrigan, unpublished data). Other studies using similar strategies (NGS of known RD genes) have similar mutation detection rates of approximately 40–50% [61,62]. The array data set detailed herein may provide a second-tier NGS screen for those patients lacking mutations in known RD genes, in principle decreasing the cost and complexity of NGS from a whole genome screen to one covering a smaller cohort of genes implicated in photoreceptor development via various functional genomic studies. Clearly, identification of the mutated gene is a prerequisite for patient recruitment to relevant targeted gene therapy clinical trials. We anticipate that this data set will be of value in identifying novel retinal genes, some of which may be of use in expediting novel treatments for RP and related forms of RD.
Cell sorting was performed by Barry Moran in the Flow Cytometry Facility, Trinity College Dublin. This work was supported by grants from the MRCG (HRB and Fighting Blindness; FB11FAR) and SFI (grant ID). We thank Caroline Woods, Charles Murray, David Flynn, David and Rachel Sullivan and Brenda Campbell for technical assistance.