Abstract
Objective. The PTPN22 rs2476601 genetic variant has been associated with rheumatoid arthritis (RA) and other autoimmune diseases. Some reports suggest that this single-nucleotide polymorphism (SNP) may not be the only causal variant in the region of PTPN22. Our aim was to identify new independent RA-associated common gene variants in the PTPN22 region.
Methods. We analyzed Wellcome Trust Case-Control Consortium genome-wide association study data for associations in the 397.2 kb PTPN22 region and selected 9 associated SNP (with p < 5 × 10−3) for replication and dependence analysis. The replication cohorts comprised 2857 patients with RA and 2994 controls from Spain, Netherlands, and Norway.
Results. We found that 6 of the 9 selected SNP were associated in the Spanish cohort. Of these, 4 were also associated in the Dutch and Norwegian cohorts, and all 6 were associated with RA in the combined analysis. Conditional analyses showed that none of these associations was independent of rs2476601.
Conclusion. The SNP rs2476601 located in the PTPN22 gene is the sole common genetic variant associated with RA in the 1p13.2 region, suggesting that neighbor genes of PTPN22 do not have a major influence in RA.
More than 100 genetic variants are now unambiguously associated with complex human autoimmune diseases1. In recent years several genome-wide association studies (GWAS) have been carried out on rheumatoid arthritis (RA), giving rise to a number of new genetic associations2,3,4,5,6. Nevertheless, association of a genetic variation does not mean causal association of the variant with the disease. The most common scenario is that a single-nucleotide polymorphism (SNP) shows association with the trait because it is in linkage disequilibrium with the causal variant.
Prior to the era of GWAS, SNP rs2476601 at PTPN22 was found to be associated with autoimmune diseases such as RA7, systemic lupus erythematosus (SLE)8, type 1 diabetes9, and Graves’ disease10,11. The risk allele encodes a nonsynonymous gain-of-function variant of lymphoid tyrosine phosphatase gene (Arg620), and functional studies have suggested a role either in loss of tolerance to self or on regulatory T cells9,12.
Nevertheless, it is suspected that other common variants may explain, at least in part, the association detected at the PTPN22 locus. Studies have investigated additional causal variants in the PTPN22 region in RA13, type 1 diabetes14,15, and Graves’ disease16, among others. Some studies suggest the presence of more than 1 variant associated with autoimmunity in the 1p13.2 region where PTPN22 is located. No study to date has been sufficiently large to fully dissect the complex association of this region with RA. Conversely, a recent study by Wan Taib, et al17 suggested that rs2476601 is the only independent RA association in the 1p13.2 region. In addition to the widely demonstrated role of PTPN22 itself in autoimmune processes, its genetic region is complex, with a relatively high density of genes spanning roughly 0.4 Mb, including (from 5’ to 3’) MAGI3, PHTF1, RSBN1, PTPN22, BCL2L15, AP4B1, DCLRE1B, HIPK1, and OLFML3. Not much is known about the function of these genes, but 2 of them (BCL2L15 and HIPK1) have roles in apoptotic processes, and thus could be involved in the pathogenesis of RA18,19,20.
We aimed to identify new common genetic variants associated with RA in the 1p13.2 region using existing GWAS data, following 2 different approaches: replicating associations from the Wellcome Trust Case-Control Consortium (WTCCC) data in 3 white populations, and imputing 1p13.2 region data in the WTCCC to extend SNP coverage.
MATERIALS AND METHODS
We wanted to assess the implications for RA of genetic variants in the PTPN22 region. We selected SNP found to be statistically associated with RA in this region in the WTCCC study2. SNP rs2476601 was not included on the Affymetrix genotyping chips used in the WTCCC, so no initial analysis of dependence with this variant could be performed in the UK population. We chose the gene-rich region surrounding the PTPN22 gene, spanning 397.2 kb (from base pairs 113,987,482 to 114,384,683) on chromosome 1, according to NCBI Build 37.1, for analysis. The genes located in this region are, from 3’ to 5’, MAGI3, PHTF1, RSBN1, PTPN22, BCL2L15, AP4B1, DCLRE1B, HIPK1, and OLFML3. On each border there is a gap of several tens of kilo-bases before another gene is found, in which also occur recombination hotspots (Figure 1).
We used 2 approaches to determine independent genetic associations in the 1p13.2 region (Figure 2), as follows. (1) All SNP with Bonferroni-corrected p values < 10−2 in the WTCCC (Table 1) were selected for genotyping and analysis in the Spanish case/control RA cohort. SNP that were in high linkage disequilibrium (r2 > 0.9) were pruned. All SNP associated in the Spanish cohort were further tested in a second stage composed of Dutch and Norwegian independent case/control cohorts for confirmation. (2) Alternatively, we performed imputation of the UK population from the WTCCC in the 1p13.2 region, including rs2476601. Dependence analysis was also carried out in the imputed UK population in order to determine independent associations.
Populations
The WTCCC RA and control populations have been described2. The initial Spanish cohort comprised 941 patients with RA and 743 healthy controls. The Dutch replication cohort consisted of 962 patients with RA and 1130 healthy controls. The Norwegian replication cohort consisted of 953 patients with RA and 1121 healthy controls. All patients with RA met American College of Rheumatology criteria21 and were recruited from Hospital Clínico San Carlos (Madrid, Spain), Hospital Xeral-Calde (Lugo, Spain), Hospital Universitario La Paz (Madrid, Spain), Hospital Clínico San Cecilio (Granada, Spain), Radboud University Nijmegen Medical Centre (Nijmegen, Netherlands), Diakonhjemmet Hospital (Oslo, Norway), and blood banks from the corresponding regions and hospitals. All individuals were white and all gave written informed consent for the study. This study was approved by the local ethics committees of the corresponding hospitals.
Genotyping methods
All SNP were genotyped using TaqMan assays (Applied Biosystems, Foster City, CA, USA) on an ABIprism 7900 HT real-time thermocycler in the Spanish and Dutch cohorts. The Norwegian cohort was genotyped by MassARRAY technology (Sequenom, San Diego, CA, USA) of the Centre for Interactive Genetics, Norwegian University of Life Sciences (UMB), except for rs2476601, where genotypes were available from previous publications22,23,24.
Data imputation
WTCCC data were imputed in order to increase the number of genotypes in the 1p13.2 region for the UK population. After performing imputation tests in this region in the WTCCC cohort, removing genotyped SNP from the panel and comparing them to their imputed counterparts, we selected for optimal imputation using the Impute software as described25 taking as reference panels the CEU and TSI (white) populations from the HapMap project26 (http://hapmap.ncbi.nlm.nih.gov/). After imputation, the data were filtered as follows: all SNP with a call rate below 0.98, all SNP that deviated from Hardy-Weinberg equilibrium (p < 0.001), and all SNP with minor allele frequency < 0.01 were excluded.
Data analysis
All data were filtered for quality using the following criteria: Hardy-Weinberg equilibrium p value < 0.001 in controls, and success call rate per individual > 0.95 and per SNP > 0.95. Genotype and allele frequencies were compared among cases and controls by chi-square analysis with 1 degree of freedom to find significant associations. P values for dominant and recessive models were also generated. All p values < 0.05 were considered statistically significant. OR and 95% CI were calculated according to Woolf’s method. For each SNP we determined dependency of the association with the other genetic variants by means of conditional logistic regression analysis using Plink software. We conditioned each SNP to each other SNP separately and altogether (multiple logistic regression).
We performed metaanalysis for pooling of the initial and replication cohorts to control for differences among populations. The metaanalysis of the different cohorts was conducted using Mantel-Haenszel test to calculate pooled OR, and the 95% CI for the OR was estimated using a random-effect model. Heterogeneity between cohorts was tested using the Breslow-Day test.
All data were analyzed using Plink software (version 1.06; http://pngu.mgh.harvard.edu/purcell/plink/)27. The association plot of the 1p13.2 region was generated with the LocusZoom software (Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA; http://csg.sph.umich.edu/locuszoom/)28.
RESULTS
Replication through case/control cohorts
Nine SNP out of 38 showed signs of association (p < 5 × 10−3) in the WTCCC study in the PTPN22 region (Table 1), and were genotyped in the Spanish cohort, in addition to the PTPN22 rs2476601. All SNP tested were in Hardy-Weinberg equilibrium.
Results obtained from the Spanish cohort are shown in Table 2. SNP in the region were tested for independency with the known causal genetic variant rs2476601 in the PTPN22 gene. We found that, in addition to rs2476601, 5 of the selected SNP were associated with RA in the Spanish population (rs2273758 in PHTF1, rs6679677 and rs1217396 in RSBN1, rs1235005 between PTPN22 and BCL2L15, and rs12029840 near OLFML3), yet these associations were not independent from rs2476601 (Table 2). Further, rs2476601 lost its significance when conditioned to 4 of these 5 SNP, probably due to the lack of statistical power.
To increase the power to detect independent associations, we then analyzed these 5 SNP and rs2476601 in the Dutch and Norwegian replication cohorts. We found that all 5 SNP were associated with RA in the Dutch cohort, and that all except rs12029840 were associated with RA in the Norwegian cohort (Table 3). Then we performed conditional logistic regression analysis for the 6 SNP in all data, taking the 3 populations into account as covariates. Pairwise conditioning showed that rs12029840 was dependent upon rs2476601, while all the others remained significant after conditioning individually for each other SNP (Table 3). We then conditioned each SNP to all the others to account for multiple or combined effects of the genetic variation in the region. Only rs2476601 (conditioned p value = 1.62 × 10−11, OR 1.518) and rs6679677 (conditioned p value = 0.0119, OR 0.893) remained significant after the analysis (Table 3). The SNP rs6679677 was found to be in almost total linkage disequilibrium with rs2476601 in the Dutch and Norwegian cohorts (r2 = 0.972 and r2 = 0.994, respectively), but not in the Spanish cohort (r2 = 0.75). In the Spanish cohort there was no conclusive evidence of dependence between these 2 variants. Further, conditional analysis on the Dutch or Norwegian cohorts alone was unable to distinguish between the association present on rs2476601 and rs6679677 due to the high linkage disequilibrium. Conditional logistic regression analysis with all 3 populations showed that the association of rs6679677, although significant, was highly conditioned to that of rs2476601 (unconditioned p value = 1.04 × 10−11; conditioned p value = 0.0119). Additionally, conditional logistic regression analysis becomes inaccurate when the variants under consideration have high linkage disequilibrium. Further, when multiple testing correction was applied to the conditioned p value of 0.0119, all significance was lost.
We also analyzed all 3 cohorts for common haplotypes (i.e., more than 5%) in patients with RA and controls. We found the same haplotypes in the 3 populations, but all associations with RA found in any population, or all of them, were dependent upon rs2476601 (data not shown).
WTCCC imputation
A total of 38 SNP were genotyped and passed quality filters in the UK population of the WTCCC study in the 1p13.2 region (supplementary data available from the authors upon request). When imputation was completed using the CEU and TSI white populations from the HapMap Project as reference panels, and the data were filtered for quality, data from a total of 123 SNP were obtained; of these, 50 showed association with RA (p < 10−2; Figure 1). After conditioning of all SNP on rs2476601 in the UK imputed population, none remained significant at the GWAS level (supplementary data available from the authors upon request). Only rs3811019 (p value = 3.32 × 10−6; conditioned p value = 0.026) and rs17032011 (p value = 1.70 × 10−6; conditioned P value = 0.029) located in the HIPK1 gene retained some degree of association, although this was much lower than the nonconditioned association, suggesting a minor role if any for these variants in RA. These 2 SNP were in almost total linkage disequilibrium in the UK population. Further, when conditioning associations in these 2 SNP for all others in the region were examined by multiple logistic regression, their significance was totally lost.
DISCUSSION
By means of logistic regression analysis we performed an association study of the 1p13.2 region conditional on the Arg620 variant (rs2476601) of the PTPN22 gene in the WTCCC dataset and replication in a total of 2857 patients with RA and 2994 healthy controls from Spain, The Netherlands, and Norway. Although we detected some suggestion of independent association, none of the SNP we analyzed significantly added to the risk conferred by rs2476601 in the multiple logistic regression analysis after appropriate Bonferroni correction.
Another study has presented results that suggest the PTPN22 rs2476601 variant is not the sole causal genetic variant for RA in the 1p13.2 region13, and shown evidence of independent association for a protective haplotype tagged by either rs12760457 or rs12730735. The sample size in that study was 1136 cases and 1797 controls (statistical power was 65% to detect an OR of 1.2 with minor allele frequency = 0.10). This haplotype was identified by SNP in our present study, but no independent association from rs2476601 was found that confirmed the previous association. The study by Wan Taib, et al17 failed to confirm independence of this association. In our study, we had 93% statistical power to detect an OR of 1.2 with minor allele frequency of 0.10; the study by Wan Taib, et al17, with 4460 cases and 4481 controls, had the highest statistical power (99%), with results identical to ours. The study by Carlton, et al13 also showed evidence of independent association of SNP rs3789604. Again, this SNP was also identified by the SNP we studied, yet again no independent association from rs2476601 was found. Thus, we were unable to confirm previous reported independent associations in the 1p13.2 region.
A rare SNP, rs33996649, in the PTPN22 region is known to be associated with RA and SLE29,30, but this variant or other rare ones were not identified in our study. In addition, rs33996649 has already been shown to be associated with RA independently of the effect of rs247660130.
A north-south geographic gradient in the frequencies of the rs2476601 T risk allele has been described in European populations31. We observed that this gradient (i.e., a higher minor allele frequency in the north, lower to the south) is also present for most SNP in the 1p13.2 region, but with varying minor allele frequencies. Further, the linkage disequilibrium pattern is also affected by this gradient, such that SNP on 1p13.2 are in stronger linkage disequilibrium in northern countries (Norway and The Netherlands) than in the south (Spain). In our study we analyzed populations with different linkage disequilibrium patterns and degrees of association in the PTPN22 region, and it is unlikely that any common variants have gone undetected due to population-specific issues (SNP in 1p13.2 region from the WTCCC data cover 70% of described variability in HapMap).
By 2 methods we show that no genetic variant in the 1p13.2 region, besides rs2476601 in the PTPN22 gene, is associated with RA. Given that our combined cohorts from Spain, The Netherlands, and Norway allow 93% statistical power to detect any genetic variant more frequent than 1% and with an OR > 1.10, it is unlikely that any other variant associated with RA in white populations exists in this region. Wan Taib, et al17 came to the same conclusion studying different cohorts and using a different methodology. Further, both imputed and genotyped data in the WTCCC (covering > 95% of common variation in whites in this region described in HapMap) also showed no other variant independently associated with RA in this region.
No supporting evidence was found for any common genetic variant, other than rs2476601, being independently associated with RA in the 1p13.2 region in white populations.
Acknowledgment
We thank Sofia Vargas, Sonia García, and Gema Robledo for their invaluable contribution in collection, isolation, and storage of the DNA samples. We also thank all the patients and control donors for their essential collaboration. Finally, we thank Banco Nacional de AND (University of Salamanca, Salamanca, Spain) and the Norwegian Bone Marrow Donor Registry, who supplied control DNA samples.
Footnotes
-
Supported by grant SAF2009-11110 and by Junta de Andalucía grant CTS-4977; and by RETICS Program, RD08/0075 (RIER) from Instituto de Salud Carlos III (ISCIII), within the VI PN de I+D+I 2008–2011 (FEDER). Dr. Koeleman is supported by the Dutch Diabetes Research Foundation (grant 2008.40.001) and the Dutch Arthritis Foundation (Reumafonds, grant NR 09-1-408). Dr. Lie is supported by the Research Council of Norway.
- Accepted for publication July 13, 2011.