Abstract
Objective. To present a systematic evaluation of 47 non-MHC ankylosing spondylitis (AS) susceptibility loci that have been initially discovered through white genome-wide association studies in Han Chinese.
Methods. Originally, 10,743 samples representing north and south Chinese in 4 datasets were obtained. After data quality control and imputation, metaanalysis results of 94,621 variants within 47 loci were extracted. Four ERAP1 single-nucleotide polymorphisms (SNP) and HLA-B27 tag SNP rs13202464 were used for interaction analysis. Population-attributable risk percentages of AS-associated variants were compared. Functional annotations of AS-associated variants were conducted using HaploReg, RegulomeDB, and rVarBase databases.
Results. We revealed 16 AS-associated variants with nominal evidence in Han Chinese, including rs10865331 (p = 6.30 × 10−10), rs10050860 (p = 4.09 × 10−5) and rs8070463 (p = 1.03 × 10−4). Potential susceptible SNP within these 47 loci were also identified, such as rs13024541 (2p15), rs17401719 (5q15), and rs62074054 (17q21). Epistatic interactions between 3 ERAP1 SNP (rs17401719, rs30187, and rs10050860) and HLA-B27 were confirmed. Among the 16 AS-associated variants, rs30187 showed weaker risk effect, while rs10050860 and rs12504282 seemed to attribute more risk in Han Chinese than in whites. Further genomic annotation pinpointed 35 candidate functional SNP, especially in the 2p15, ERAP1, and NPEPPS-TBKBP1 regions.
Conclusion. Our results provided a detailed spectrum of all the reported non-MHC AS susceptibility loci in Han Chinese, which comprehensively exhibited the ethnic heterogeneity of AS susceptibility and highlighted that 2p15, ERAP1, and NPEPPS-TBKBP1 regions may play a critical role in AS pathogenesis across diverse populations.
Ankylosing spondylitis (AS) is a chronic inflammatory disease predominantly involving axial skeleton and entheseal insertion sites, with a prevalence of 0.2% in Asians1,2,3. The mechanism underlying AS remains to be elucidated, while genetic factors are known to play a prominent role in AS pathogenesis4.
In the past decade, the understanding of AS genetic factors has been greatly improved by the application of genome-wide association studies (GWAS). The first genome-wide scan study identified 2 important AS susceptibility genes: IL-23R and ERAP15. ERAP1 association with AS was then found to be restricted in HLA-B27–positive cases6. Subsequently, a variety of risk variants related to innate and adaptive immunity as well as inflammatory responses were identified through an Immunochip study7. A large-scale study using the Immunochip data with newly added controls revealed 17 novel AS susceptibility loci8. To date, over 40 non-MHC AS susceptibility loci have been identified at genome-wide significance in whites, providing valuable insights into the underlying disease mechanism5,6,7,8,9,10.
Research in the Chinese population on AS susceptibility loci was less informative. Our group performed the first (to our knowledge) AS GWAS in Han Chinese, which analyzed about 1.4 million single-nucleotide polymorphisms (SNP) in 1837 cases and 4231 controls in the discovery stage and confirmed the 2p15 association at a genome-wide level11. More recently, another genome-wide dataset by Immunochip was reported, comprising 128,935 SNP in 1550 cases and 1567 controls of East Asians. While suggestive associations were seen within several loci, including 2p15 and ERAP1, some known loci were not significant, suggesting potential ethnic heterogeneity of AS susceptibility across whites and Han Chinese7. In addition, candidate gene studies have been conducted in Han Chinese, and a low-frequency SNP in IL23R as well as SNP in or near ERAP1, CARD9, MMP9, and RUNX3 were found to be associated with AS12,13,14,15,16,17. However, few studies examined the frequency distributions between north and south Chinese, which may be a confounding factor18. Further, most studies had small sample sizes, and the results were often inconsistent.
To systematically evaluate all the known non-MHC AS susceptibility loci in the Chinese population, we augmented the power of our previously published GWAS dataset by increasing the genome-wide coverage of genetic variants through imputation and by including a large number of additional control samples. We also investigated the frequencies of these risk variants in north and south Chinese individuals.
MATERIALS AND METHODS
Subjects
All cases met the 1984 modified New York criteria for AS, consisting of 968 individuals genotyped using HumanHap 610-Quad Bead chip and 997 individuals by OmniExpress Beadchip as previously reported11. Genotype data of 9479 controls were accordingly derived from a series of published GWAS, including 5 new datasets (Supplementary Table 1, available with the online version of this article)18–25. All healthy controls were screened through questionnaires to exclude those with diagnosed AS. All participants were self-reported as being of Chinese descent. This study was conducted according to the Declaration of Helsinki. It was approved by the institutional review board at the Third Affiliated Hospital of Sun Yat-sen University (approval number: SYSU3-[2007]40) and informed consent was obtained from all individuals before blood samples were collected.
Study design and data analysis of GWAS datasets
We first performed quality control (QC) on the genotype data in 2 platforms separately. Samples in each platform were then divided into north and south datasets using principal component analysis (PCA), to better dissect the Chinese population genetic architecture. After imputation and association analysis in each dataset, the metaanalysis result got a broader coverage of variants. A flowchart is presented in Supplementary Figure 1, available with the online version of this article.
A primary QC in each dataset was performed in PLINK v1.07 to remove samples with a call rate < 0.96 or discrepancies between estimated and recorded sex, and to exclude SNP with call rate < 0.90, with minor allele frequency (MAF) < 0.01, or deviating from Hardy-Weinberg equilibrium (HWE) in controls (p < 1.0 × 10−8). After combining datasets in 2 platforms, respectively, 451,984 SNP and 532,418 SNP passed stringent QC criteria of call rate < 0.96, MAF < 0.01, HWE deviation in controls (p < 1.0 × 10−6). Samples were eliminated if they had heterozygosity > 3 SD from the mean value of all individuals, or cryptic relatedness, or were population outliers on PCA with 206 Hapmap samples. PCA analyses were implemented with an inhouse R script to remove outliers after dividing samples into north and south cohorts according to PCA plots (Supplementary Figure 2, available with the online version of this article). Phasing was then conducted using SHAPEIT version 2 in each cohort, followed by whole-genome imputation implemented by IMPUTE version 2, using the 1000 Genomes Project Phase 3 data (November 2014) as a reference panel. After imputation, only SNP with imputation certainty ≥ 80%, MAF ≥ 0.5%, and without significant deviation from HWE in controls (p < 1.0 × 10−6) in all 4 cohorts were included. To achieve reliable imputation quality, any SNP genotype with an imputed possibility < 0.9 was classified as missing, and SNP with a missing rate > 0.1 were excluded in subsequent association analysis. Finally, variants located within 500 kb up/downstream of AS-associated SNP within 47 reported non-MHC loci were included for the current study. The regional association plots for each locus were generated using LocusZoom (locuszoom.sph.umich.edu).
Linkage disequilibrium (LD) pattern and interaction analysis
Five genotyped SNP in or around ERAP1 with a p value < 0.05 were extracted. Two imputed SNP, a reported AS-associated SNP (rs10045403, p = 9.65 × 10−3, OR 1.14) and the most significant SNP in our dataset (rs17401719, p = 4.21 × 10−5, OR 1.32), were also included using best-guess genotype. LD analysis of these 7 SNP was then implemented in Haploview software 4.2, using the default algorithm CI to define LD blocks. After excluding SNP in high/moderate LD, 4 SNP (rs17401719, rs30187, rs10050860, and rs10045403) were tested for interaction with the genotyped HLA-B27 tag SNP rs13202464 under a regression model in each cohort using R version 3.4.2, and the metaanalysis result was presented.
Statistical analysis
Association analysis of both dosage-format genotyped SNP and imputed SNP was implemented using SNPTEST version 2.5.2 in each dataset, using the first one or two PC as covariates to get a minimum λGC of genotyped variants. Fixed-effect metaanalysis was performed using META version 1.5, where p values from Cochran’s Q statistics and the I2 heterogeneity were obtained. The quantile-quantile (Q-Q) plot and genomic inflation factor (λGC) were generated to observe the potential population stratification using R. The phenotypic variance explained by the input variants was computed as the pseudo R2 in logistic regression model using an in-home R script. Statistical power calculation was implemented using the Genetic Association Study Power Calculator (csg.sph.umich.edu/abecasis/gas_power_calculator/index.html) based on the OR reported in whites with a 2-sided type I error rate of 0.05, assuming the prevalence of AS in China is 0.2%. The population-attributable risk percentages (PARP) were calculated using the formula26 where RAF = risk allele frequency.
Functional annotation
To pinpoint the potential causal variants, the 16 AS-associated SNP and their proxy SNP (r2 ≥ 0.80 in the Asian population of the 1000 Genome Project) were extracted from the HaploReg database version 4.1 (HaploReg v 4.1)27. Coding SNP were then annotated through the ensemble database. For noncoding variants, the annotation strategy was as follows: (1) the regulatory consequence of all candidate SNP were estimated based on the functionality score using RegulomeDB version 1.1, which integrates public datasets from the Gene Expression Omnibus, the ENCODE project, and published literature28; (2) we then searched through the HaploReg v4.1 to find out whether these potential functional SNP overlap with promoter/enhancer histone modification, deoxyribonuclease (DNase) peaks, or protein binding sites; (3) the rVarBase database (the 2.0 version of rSNPBase) providing experimentally validated regulatory information was used for chromatin state and overlapped regulatory SNP/regulatory copy number variation annotation29.
RESULTS
After stringent QC and PCA analysis, the genome-wide genotype data were available in 1860 AS cases and 8883 controls, comprising 4 datasets representing Chinese populations in the north and south. Fixed-effect metaanalysis results of 5.8 million non-MHC variants were obtained. The λGC of all these non-MHC variants was 1.033 (Supplementary Figure 3, available with the online version of this article), suggesting that the population stratification was well controlled. Results of 94,621 variants within 47 reported non-MHC AS susceptibility loci in whites were extracted for subsequent analyses.
Association analysis of reported AS loci in Han Chinese
Among 64 AS-associated SNP within 47 non-MHC AS loci initially reported in whites, 51 SNP passed QC for association testing. In total, 16 SNP with the same risk allele in 13 loci showed consistent association at nominal significance (Table 1, and Supplementary Figure 4, available with the online version of this article). The most significant SNP was rs10865331 (2p15, OR 1.26, p = 6.30 × 10−10), whose p value reached the genome-wide significance level. It was followed by rs10050860 in ERAP1 (5q15, OR 1.42, p = 4.09 × 10−5) and rs8070463 near TBKBP1 (17q21, OR 1.16, p = 1.03 × 10−4). No statistically significant association (p < 0.05) was seen in the other 35 reported SNP, although it was noticeable that all of them showed the same direction of association effect between whites and Han Chinese (Supplementary Table 2).
We then calculated the power of our study for detecting the associations at these 51 available SNP, based on their reported OR of published GWAS studies on whites and allele frequency in the Chinese population (as revealed by our current study). The power ranged from 39.8% to 100.0% for the 16 SNP, with significant association and from 22.5% to 98.5% for the remaining SNP (Table 1, and Supplementary Table 2, available with the online version of this article).
In addition to those reported variants, 94,570 additional SNP within these loci were also tested for association with AS (Supplementary Table 3A, available with the online version of this article). Among them, these had moderate to strong LD with the previously reported variants and showed more significant association evidence in our study (Supplementary Table 3B): SNP rs13024541 (2p15, p = 3.12 × 10−10), rs17401719 (5q15, p = 4.21 × 10−5), and rs62074054 (17q21, p = 2.7 × 10−4). Additional suggestive associations were also detected within these loci and are independent of the reported variants. These potential novel associations are worth further investigation.
Overall, 16 variants were found to show significant association in Han Chinese. They accounted for 2.50% of phenotypic variance of AS in our datasets, with rs10865331 located in 2p15 (0.90%) and rs10050860 in ERAP1 (0.30%) as the 2 variants with highest contribution.
HLA-B27–ERAP1 interaction. We next performed the HLA-B27–ERAP1 interaction analysis. Among the 7 SNP around the ERAP1 region, SNP rs42398, rs30187, and rs27434 are in high LD (Supplementary Figure 5, available with the online version of this article), and as a result, 4 independent SNP (rs17401719, rs30187, rs10050860, and rs10045403) were chosen for the interaction analysis with rs13202464, the tag SNP for HLA-B27.
Strong evidence of interaction with rs13202464 was seen at rs30187 (p_interaction = 7.08 × 10−4, p_het = 0.35) and rs10045403 (p_interaction = 4.93 × 10−4, p_het = 0.97). The low-frequency variant, rs10050860, also exhibited interaction evidence with rs13202464 (p_interaction = 9.8 × 10−3, p_het = 0.55). Consistent with the findings in whites, the stratified analysis showed that the 3 SNP showed significant association only in HLA-B27–positive cases and controls, but not in HLA-B27–negative samples (Table 2).
Population risk comparison
As shown in Table 3, the PARP values of 4 SNP, including rs10510607, rs10440635, rs30187, and rs2297518, were significantly higher in whites than in Han Chinese. It is worth noting that rs30187 in ERAP1 exhibited significantly higher OR but lower allele frequency in whites. Two variants, rs10050860 in ERAP1 and rs12504282 in ANTXR2, attributed more risk in Han Chinese, because of their higher OR as well as higher risk allele frequency distribution. Despite differences in their risk allele frequencies and OR, most results of these 16 variants were comparable.
Functional annotation of non-MHC variants
Finally, systematic genomic annotation of these 16 SNP and 245 proxy SNP (r2 ≥ 0.8 in Asians) were conducted to study their functional consequences. Thirteen SNP were missense or synonymous variants: rs6672420 (I18N) in RUNX3, rs6897932 (T244I) in IL7R, rs3742704 (I231L) in GPR65, rs2297518 (S608L) in NOS2, and 9 SNP in ERAP1: rs27044 (Q730E), rs30187 (K528R), rs26653 (R127P), rs17482078 (R725Q), rs10050860 (D575N), rs2287987 (M349V), rs469783 (A637A), rs27529 (S453S), rs27434 (A356A). On the other hand, 22 SNP within 7 loci exhibited multiple layers of ENCODE data, with all of them having expression quantitative trait loci (eQTL) or DNaseI sensitivity QTL (dsQTL) effects, intersecting with transcription factor binding or DNase peak and displayed enhancer/transcription/active transcription start site chromatin state in blood (Table 4). For 2p15, rs13001372 was found to locate within a DNase I sensitivity quantitative trait locus, and may interact with various proteins in lymphoblastoid cells. Other SNP were found to affect binding and to be linked to gene expression including ERAP1, ACTA2, and NPEPPS-TBKBP1, highlighting their functional involvement in AS pathogenesis.
DISCUSSION
By augmenting our previously published GWAS dataset through increasing the genome-wide coverage of genetic variants after imputation, and including a large number of additional control samples, we did a comprehensive evaluation of the 47 non-MHC AS susceptibility loci in a Chinese population. In addition to identifying additional AS susceptibility loci in a Chinese population, our study revealed shared susceptibility loci between Han Chinese and white populations as well as potential population-specific loci. This suggests some ethnic heterogeneity of AS genetic susceptibility between the 2 populations.
Ethnic heterogeneity has always been an interesting topic in disease susceptibility studies. While most common risk variants are shared across different populations, ethnic heterogeneity has been demonstrated and can be categorized into 2 types: (1) different populations having different variants within shared loci (allelic heterogeneity); and (2) ethnic-specific loci, a type that exhibits significant association evidence only with specific populations (locus heterogeneity)23,24,25. In our study, we confirmed 16 susceptible variants in a Chinese population and further showed their consistent association effects between whites and Han Chinese (Table 1). Among them, only the 2p15 locus achieved genome-wide significance, which was consistent with the Immunochip result of East Asians (OR 1.28, p = 1.6 × 10−6)7. It showed a strong effect as a common variant and has only been reported to associate with AS and IBD, indicating its vital role in AS pathogenesis in addition to the HLA-B locus26. We also noticed that the association result of rs9901869 was in concordance with the Immunochip study (rs9901869: OR 1.18, p = 2.0 × 10−3; result of rs8070463 not available)7. Further multicenter metaanalysis may achieve the genome-wide significance of these suggestive variants, particularly in ERAP1 and TBKBP1 region, to better elucidate the AS susceptibility in Han Chinese.
In addition, our study also demonstrated that although the sample size provided sufficient statistical power for detection (> 80%), some known loci, such as rs11190133 near NKX2-3, did not show any evidence of association in a Chinese population (Supplementary Table 2, available with the online version of this article). Ethnic-specific loci may help explain why whites are more susceptible to AS. However, for the other variants, which showed the same effect across populations but with insufficient power, lack of evidence of statistical significance may be due to small sample size. Further, our study has also suggested some novel risk variants within the reported loci in a Chinese population that are independent from the ones reported in whites (Supplementary Table 3A). Although the evidence at these novel variants is not statistically significant, some (such as rs650854 near HHAT) did show consistent association effect and frequency across 4 independent datasets. Further studies with bigger sample size are needed to replicate these novel variants and to confirm the allelic heterogeneity of these loci across populations.
We performed the HLA-B27–ERAP1 interaction analysis. To date, such interaction studies in Han Chinese are very few27, and none included the reported HLA-B27–interacting SNP (rs30187, rs10050860)6. With the largest number of Han Chinese individuals investigated to date, to our knowledge, our results demonstrated strong evidence of the interaction and confirmed that the reported ERAP1 SNP only showed association with AS in HLA-B27–positive individuals (Table 2). Nevertheless, we noticed that the detection powers of these SNP in HLA-B27–negative samples were 33.7% to about 57.0%, indicating that a bigger sample size is still needed to confirm the lack of association in HLA-B27–negative samples.
In addition to genetic associations, those 16 variants were used for PARP comparison. Most of the results were similar, while notable PARP differences in rs30187 and rs10050860 were observed. Generally, higher RAF and/or OR get a higher PARP. SNP rs30187 has a lower RAF in whites, but a markedly higher OR; it attributed more risk in whites. Conversely, the PARP of rs10050860 was higher in Han Chinese. This may explain part of the higher prevalence of AS in whites, and whether this difference is related to the underlying mechanism is unknown.
Our function annotation analysis pinpointed 13 coding variants within the known risk loci. Of note, most were located in gene ERAP1. ERAP1 encodes endoplasmic reticulum aminopeptidase 1, which trims peptides to appropriate lengths for MHC class I binding. Epistatic association between ERAP1 and HLA-C was identified in psoriasis28. A previous study demonstrated that SNP rs30187 (R528K) can alter the expression levels of B*27:05-bound peptidomes and their structural features29, while some nonsynonymous polymorphisms in ERAP1 can influence its expression30, accounting for its genetic association with AS. The noncoding annotation result highlighted that 2p15 and the NPEPPS-TBKBP1 region may also participate in AS pathogenesis through regulatory mechanism. Sequencing on 2p15 was performed to identify causal variants, but no definite coding variant was identified31. We found that the annotated proxy rs13001372 was a dsQTL in the lymphoblastoid cell line (Table 4). Further, the ChIP-seq data from the RegulomeDB database indicated that this SNP can bind to transcription factor IRF4, which regulates CD8+ T cell differentiation, expansion, and metabolism32. As potential influential contributors to phenotypic variation, how dsQTL got involved in AS pathogenesis remains unknown. On the other hand, proxies of rs9901869 had an eQTL effect on TBKBP1, and TBKBP1 is part of the network in TNF/nuclear factor-κB pathway, which plays a pivotal role in AS. Variants around the NPEPPS-TBKBP1-TBX21 region were found to influence TBX21 expression and possibly interleukin 1733. Fine-mapping of this locus would help elucidate how these variants are engaged in AS pathogenesis.
As a limitation, with the exception of the 2p15, the association evidence was moderate for most variants with nominal significance and did not survive after Bonferroni correction. Rare variants were not included in association testing, largely because of the limited power of our current study to detect genetic association effects at rare variants. However, it is noticeable that the vast majority of the variants analyzed in our current study showed consistent associations, suggesting the homogeneous character of AS genetic susceptibility between these 2 populations.
In the future, independent large-scale studies in Han Chinese are needed. Adding the shared AS-associated variants into AS genetic predictive profiling along with HLA-B27 may help improve AS/spondyloarthritis risk prediction, especially in HLA-B27–negative cases. It is also worthwhile to evaluate whether the combined genetic profiling could distinguish patients with AS in chronic back pain cohorts. Meanwhile, how these susceptibility variants participate in the inflammation and bone formation mechanism underlying AS is worth further molecular investigation, which may help reveal the crucial pathway for targeted therapy. Last, investigation of the susceptibility variants related to any specific manifestation of AS in Han Chinese, such as acute anterior uveitis34, disease progression, and severity35 in addition to therapy response, will be of great interest and meaningful for clinical decision making.
The comprehensive association analyses of 47 non-MHC AS susceptibility loci were carried out in 10,743 Han Chinese individuals. We revealed 16 AS-associated variants and confirmed the HLA-B27–ERAP1 interaction in Han Chinese. Our findings delineated the ethnic heterogeneity and homogeneity of genetic susceptibility across populations and highlighted that 2p15, ERAP1, and NPEPPS-TBKBP1 may play essential roles in AS pathogenesis.
Acknowledgment
We thank all study subjects for their participation. We acknowledge E. Shyong Tai, Xueling Sim, and Rajkumar Dorajoo for contributing genotype data of the Singapore Prospective Study Program for the analysis.
Footnotes
This study was supported by the Science and Technology Planning Project of Guangdong Province, China (2016A020216013), Guangdong Natural Science Funds for Distinguished Young Scholars (Grant No. 2014A030306039), the Municipal Healthcare Joint-Innovation Major Project of Guangzhou and High-level Personnel of Special Support Program for Technology Innovative Talents and the Top Young of Guangdong Province (Grant No. 2015TQ01R516), the National Supercomputer Center in Guangzhou and the Agency of Science, Technology and Research (A*STAR) of Singapore, the HUJ-CREATE Programme of the National Research Foundation, Singapore (Project Number 370062002), the US National Institutes of Health (Grant Numbers R01CA144034 and UM1 CA182876), the Singapore National Medical Research Council (Grant Number 1270/2010), the 5010 Project of Sun Yat-sen University (Grant No. 2007023), Distinguished Young Scholar Candidates Programme for The Third Affiliated Hospital of Sun Yat-Sen University, and Pearl River Nova Program of Guangzhou (Grant No. 201610010005).
- Accepted for publication August 23, 2019.