Abstract
Objective. Extensive association analyses including genome-wide association studies (GWAS) and powerful metaanalysis studies have identified a long list of loci associated with rheumatoid arthritis (RA) in very large populations, but most of them established statistical associations of genetic markers and RA only at the DNA level, without supporting evidence of functional relevance. Our study serves as a trial to detect the functional mechanisms underlying associations for RA by searching publicly available datasets and results.
Methods. Based on publicly available datasets and results, we performed integrative analyses (gene relationships across implicated loci analysis, differential gene expression analysis, and functional annotation clustering analysis) and combined them with the expression quantitative trait locus (eQTL) results to dissect functional mechanisms underlying the associations for RA.
Results. By searching 2 GWAS, Integrator and PheGenI, we selected 98 RA association results (p < 10−5). Among these associations, we found that 8 single-nucleotide polymorphisms (SNP; rs1600249, rs2736340, rs3093023, rs3093024, rs4810485, rs615672, rs660895, and rs9272219) serve as cis-effect regulators of the corresponding eQTL genes (BLK and CD4 in non-HLA region; CCR6, HLA-DQA1, and HLA-DQB1 in HLA region) that also were differentially expressed in RA-related cell groups. These 5 genes are closely related with immune response in function.
Conclusion. Our results showed the functional mechanisms underlying the associations of 8 SNP and the corresponding genes. This study is an example of mining publicly available datasets and results in validation of significant disease-association results. Using public data resources for integrative analyses may provide insights into the molecular genetic mechanisms underlying human diseases.
Rheumatoid arthritis (RA) is a common systemic inflammatory arthritis characterized by chronic inflammation of multiple peripheral joints, which leads to destruction of cartilage and bones, progressive deformity, and severe disability. Identifying genetic factors for RA is challenging because of the complex nature of genetic determination, including polygenic determinations and gene-by-gene and gene-by-environment interactions. Recently, there has been great progress in characterization of genetic susceptibility to RA1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17. Extensive genetic analyses including hypothesis-independent genome-wide association studies (GWAS)1,3,5,10,11,16 and metaanalysis studies8,12,15,18 identified a long list of loci associated with RA with relatively high statistical power. However, most of those studies established the statistical associations of genetic markers and RA only at the DNA level without exploring functional mechanisms underlying the associations. Such established associations usually did not provide immediate insights into functions of genes or regulation of gene expression that bridges gene code information and RA phenotype directly.
Currently, numerous public gene expression datasets (e.g., National Center for Biotechnology Information, US National Library of Medicine; website: www.ncbi.nlm.nih.gov/geo) are available to identify genes relevant to RA in the original studies. These datasets will serve as important resources for further data-mining to provide supplementary functional evidence to bridge the relationship of genetic markers and RA.
Based on the available datasets and results, our study aimed to detect functional mechanisms underlying the associations identified in previous studies by performing integrative analyses [gene relationships among implicated loci (GRAIL), differential gene expression analysis, and functional annotation clustering analysis] and by combining the findings of expression quantitative trait loci (eQTL).
MATERIALS AND METHODS
Selection of RA-associated single-nucleotide polymorphisms (SNP)
The GWAS Integrator19 and Phenotype-Genotype Integrator (PheGenI; www.ncbi.nlm.nih.gov/gap/PheGenI/) are 2 bioinformatics resources that provide robust lookup and analytic functionalities for published GWAS and metaanalysis studies. We used the phenotype “rheumatoid arthritis” to search the 2 databases and found 98 interesting association results with p < 10−5 (data not shown, available from author upon request) between 85 unique SNP and RA1,2,3,4,5,6,7,8,9,10,11,12,13,14.
In most situations, significant SNP means at least 1 susceptible gene or functional locus is located within the nearby region of the SNP. A common approach for identification of candidate genes is to select nearby genes located at the 2 sides of each associated SNP as candidate genes for further functional studies. To increase the chances of finding susceptible genes, we used the GRAIL20 to prioritize and identify the susceptible candidate genes near RA-associated SNP. This approach automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts, then prioritizes and identifies the candidate gene around associated SNP.
Analysis through eQTL
As is commonly recognized, variants at the DNA level may lead to variations in quantifiable intermediate phenotypes (such as at the RNA level), which subsequently result in variations in endpoint phenotypes, such as susceptibility to RA. Analysis with eQTL is an important approach in detection of functional mechanisms underlying association by testing whether identified variants at the DNA level may lead to variations in messenger RNA (mRNA) expression of nearby genes. Several recent large-scale studies have identified thousands of putative eQTL in multiple tissues or cells including liver, lymphoblastoid cell lines (LCL), monocytes, T cells, brain, and fibroblasts21,22,23,24,25,26,27, which were summarized and archived in databases for quick and easy searching and comparison (eQTL: http://eqtl.uchicago.edu/cgi-bin/gbrowse/eqtl/; GTEx eQTL: www.ncbi.nlm.nih.gov/gtex/GTEX2/gtex.cgi). To investigate the functional mechanisms underlying the above-selected associations, we first searched the 2 databases for eQTL to evaluate whether the above SNP influence transcript levels of genes as cis-effect regulators (eSNP) in multiple RA-related tissues or cells.
Differential expression analysis for RA-associated genes
We performed differential expression analyses for nearby genes physically located at the 2 sides of each associated SNP and the genes detected by GRAIL. First, we downloaded 5 publicly available expression datasets from GEO Datasets (www.ncbi.nlm.nih.gov/geo; GSE10500, GSE1402, GSE12021, GSE15573, and GSE23561). These studies were performed with the original purpose of identifying genes underlying RA in multiple tissues or cells, e.g., peripheral blood, peripheral blood mononuclear cells (PBMC), synovial membrane, synovial macrophages, and synovial fluid mononuclear cells (SFMC). The experiment procedures and data analyses including normalization of raw data were detailed in the original publications28,29,30,31,32. Using the normalized data available on the public databases, t tests were performed to identify differential expression genes by comparing means of the gene expression signals in RA cases and controls.
Functional annotation clustering analysis
Functional annotation clustering analyses were performed to explore functional similarity of the associated genes to cluster them in particular biological gene ontology (GO) terms and functional pathways as defined by the Gene Ontology project and the KEGG gene database, respectively. The significantly associated genes were functionally annotated using the Database for Annotation, Visualization and Integrated Discovery (DAVID) integrated database query tools (http://david.abcc.ncifcrf.gov/)33,34. Fisher’s exact test was performed to determine whether the GO term or pathway annotates a specified list of genes at a frequency greater than would be expected by chance. Usually, p value is ≤ 0.05 to be considered strongly enriched in the annotation categories. Bonferroni correction was also adopted for multiple testing.
RESULTS
By searching the GWAS Integrator and PheGenI, we selected 98 RA association results (p < 10−5) between 85 unique SNP and RA (data not shown, available from author upon request). These association results come from 14 published RA association studies1,2,3,4,5,6,7,8,9,10,11,12,13,14. Some of these studies have relatively large sample sizes, e.g., the study performed by Stahl, et al12 has 5539 cases and 20,169 controls and 6768 cases and 8806 controls in initial and replication analyses, respectively. The associations were distributed across the genome except for chromosomes 13 and 19, but were relatively tightly clustered into the HLA region (chromosome 6p21.3; 24 associations; data not shown, available from author upon request).
About 39 “susceptible” genes detected by GRAIL analysis are not overlapped with the genes identified according to the physical positions (data not shown, available from author upon request). A total of 123 unique candidate genes were identified using the above 2 methods.
Among the 85 unique SNP, we found that about 21 SNP have potential eQTL effects (Table 1, and data not shown, available from the author upon request) in monocytes, T cells, and LCL, which are 3 types of cells closely correlated with immune response. Among them, 18 associations were detected (Table 1) between 11 cis-effect SNP and the expression of 9 genes from the 123 “identified” genes. Specifically, SNP (rs2736340 and rs9272219) showed strong associations with expression levels of 2 nearby genes of both sides (FAM167A and BLK for rs2736340; HLA-DRB1 and HLA-DQA1 for rs9272219), as evidenced in both monocytes and LCL. Five SNP are located at an HLA region and 6 at a non-HLA region.
The 123 “identified” genes might be involved in the pathology of RA. In the peripheral blood cells, PBMC, synovial membrane, synovial macrophages, and SFMC (Table 2, and data not shown, available from author upon request) in multiple RA-related cells, t tests showed that a total of 96 genes among the 123 “identified” genes have differential expression signals (p < 0.05), in at least 1 functional study. Among the above-identified 9 eQTL genes, 5 genes (BLK and CD4 in the non-HLA region; CCR6, HLA-DQA1, and HLA-DQB1 in the HLA region) were differentially expressed in samples S1 (synovial macrophages), S2 (PBMC/SFMC), or S5 (peripheral blood cells; Table 2).
To test the probability of our “identified” genes clustering into a particular biological pathway and GO term, we performed a functional annotation clustering analysis using the functional annotation tool of the DAVID database. We found significant GO terms and KEGG pathways even after Bonferroni correction. For the 123 “identified” genes, genes tend to enrich in immune-related pathways (such as “autoimmune thyroid disease,” “intestinal immune network for IgA production,” and “immune response antigen processing and presentation”; data not shown, available from author upon request). Most importantly, we found a significant clustering (Bonferroni correction p = 4.25E-09) of genes involved in immune response — genes very closely relevant to RA. Five eQTL genes (HLA-DQB1, HLA-DRB1, HLA-DRB5, CD40, and HLA-DQA1) were significantly clustered in multiple immune-related KEGG pathways or GO terms (data not shown, available from author upon request).
This functional evidence, taken together, highlighted the significance of 8 RA-associated SNP (rs1600249, rs2736340, rs3093023, rs3093024, rs4810485, rs615672, rs660895, and rs9272219) listed in Table 1, because these SNP serve as cis-effect regulators of RA-associated genes that were differentially expressed in RA-related cell groups.
DISCUSSION
Using publicly available datasets, we performed integrative analyses and combined them with eQTL results to detect functional mechanisms underlying the associations for RA. We discovered 11 SNP acting as cis-effect regulators on 9 “identified” genes, of which 5 (BLK, CCR6, CD40, HLA-DQA1, and HLA-DQB1) were differentially expressed. The 123 “identified” genes tend to enrich in KEGG pathways (intestinal immune network for IgA production, and immune response antigen processing and presentation) and are closely relevant to RA.
In biologic systems, genetic information carried by DNA is passed on to RNA molecules by transcription and then to protein molecules through translation. Sequence variants at the DNA level represent a class of heritable molecules and contribute to variability of complex traits in a population. Functional mechanisms underlying associations between variants at the DNA level and RA may be that DNA sequence variants lead to variation in quantifiable intermediate phenotypes (such as at the RNA level), which subsequently lead to variation of susceptibility to RA. Therefore, integrating substantial evidence from multiple levels (i.e., DNA, RNA) could ascertain the potential functioning mechanism of genes and their contribution to variation in susceptibility to RA.
The functional evidence from our study suggests potential regulatory mechanisms underlying the associations between RA and 8 SNP (rs1600249, rs2736340, rs3093023, rs3093024, rs4810485, rs615672, rs660895, and rs9272219) in a general way. Briefly, different genotypes of the 8 SNP in the corresponding genes (BLK, CCR6, CD40, HLA-DQA1, and HLA-DQB1), by regulating differential mRNA transcriptions of the 5 genes and thus differential protein expression or enzyme activity, consequently have an effect on variation of susceptibility to RA in a population. However, we could not establish the functional link between variants and RA in an exact way. We still do not know which allele has the risk effect, and whether the risk allele of a variant leads to downregulation or upregulation of mRNA transcription, because the databases archive only the p value, not the direction and effect of the associations (i.e., risk allele and OR). Despite this, the supportive functional evidence may still be useful for future in-depth functional validation of the involvement of variants in RA.
It is noteworthy that failure to find functional evidence underlying the selected associations in publicly available data and results does not necessarily exclude the importance of the associations in RA, given that (1) the absence of association between RA-associated SNP and mRNA expression may result from limited statistical power due to the small sample size or small effect of the SNP; (2) the associated genes may exert effects on RA through other tissues or cells not studied here; and (3) the sequence variant in a gene may lead to RA by other mechanisms (e.g., epigenetic regulation) instead of regulating gene expression. In-depth functional studies are required to provide evidence for the mechanisms underlying the detected associations. Other bioinformatics resources are available for in-depth functional studies, such as the ENCODE database, which holds additional information about noncoding human enhancer, transcription factor binding, etc., and the FASTSNP program, used to analyze and predict potential functional effects of causal variants35.
The classic MHC loci have long been regarded as the major contributors to the pathogenesis of RA. However, recent GWAS have identified a series of candidate genes outside the MHC region, suggesting that non-MHC RA susceptibility loci may also be important in RA pathogenesis and thus may attract researchers’ interest. BLK and CD40 are 2 strongly and consistently RA-associated loci outside the MHC region. B cell lymphocyte kinase (BLK) gene is a member of the Src kinase family, which may influence the proliferation and differentiation of cells. BLK plays a role in B cell receptor signaling and B cell development. Studies have identified several RA-associated variants at the DNA level in both white and Asian populations (e.g., rs922483, rs2248932, rs2736340, and rs1600249)3,4,36. A study performed to detect functional mechanisms underlying the associations identified that the rs922483 may serve as a cis-regulatory locus in regulating mRNA and protein expression of BLK prominently observed in B cells early in the development of BLK37. Our study provided functional evidence for 2 SNP (rs2736340 and rs1600249) that contribute to variation of susceptibility to RA in the population. CD40 is a member of the TNF-receptor super-family. This gene plays an essential role in mediating a broad variety of immune and inflammatory responses. The SNP rs4810485 is a well established variant that confers risk of RA11,38,39 and influences the rate of joint destruction in RA40. However, to our knowledge, no followup functional studies have investigated the underlying mechanisms for the association. Our study attempted to provide suggestive functional evidence to support the association.
Although we gained some functional evidence for the associations between RA and 8 SNP, based on the current available evidence, it is still challenging to discriminate causal variants, which have potential functional effects, from nearby marker SNP, especially for SNP at the HLA region, where there is a well known strong linkage disequilibrium (LD). About 10 SNP extending a relatively long distance (460 kb) were consistently associated with RA (data not shown, available from author upon request). The HLA region may contain only a few true functional variants, but because of the strong LD between the markers and functional variants, about 10 significant SNP have been detected. A study was performed to identify functional and potential causal variants within the HLA region41.
The functional evidence found in our study supports the associations between RA and 8 SNP, because these SNP serve as cis-effect regulators, and the corresponding eQTL genes were also differentially expressed. Use of public data resources for integrative analyses may provide insights into the molecular genetic mechanisms underlying human diseases.
Footnotes
-
Supported by the Natural Science Foundation of China (31071097, 31271336), the Startup Fund from Soochow University (Q413900112, Q413900712), a project sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, and a project of the Priority Academic Program Development of Jiangsu Higher Education Institutions.
- Accepted for publication March 14, 2013.