Abstract
Objective. Specific risk alleles for childhood-onset systemic lupus erythematosus SLE (cSLE) vs adult-onset SLE (aSLE) patients have not been identified. The aims of this study were to determine if there is an association (1) between non-HLA–related genetic risk score (GRS) and age of SLE diagnosis, and (2) between HLA-related GRS and age of SLE diagnosis.
Methods. Genomic DNA was obtained from 2001 multiethnic patients and genotyped using the Immunochip. Following quality control, genetic risk counting (GRCS), weighted (GRWS), standardized counting (GRSCS), and standardized weighted (GRSWS) scores were calculated based on independent single-nucleotide polymorphisms from validated SLE loci. Scores were analyzed in a regression model and adjusted by sex and ancestral population.
Results. The analyzed cohort consisted of 1540 patients: 1351 females and 189 males (675 cSLE and 865 aSLE). There were significant negative associations between all non-HLA GRS and age of SLE diagnosis: P = 0.011 and r2 = 0.175 for GRWS; P = 0.008 and r2 = 0.178 for GRSCS; P = 0.002 and r2 = 0.176 for GRSWS (higher GRS correlated with lower age of diagnosis.) All HLA GRS showed significant positive associations with age of diagnosis: P = 0.049 and r2 = 0.176 for GRCS; P = 0.022 and r2 = 0.176 for GRWS; P = 0.022 and r2 = 0.176 for GRSCS; P = 0.011 and r2 = 0.177 for GRSWS (higher GRS correlated with higher age of diagnosis).
Conclusion. Our data suggest that there is a linear relationship between genetic risk and age of SLE diagnosis and that HLA and non-HLA GRS are associated with age of diagnosis in opposite directions.
Systemic lupus erythematosus (SLE) is a chronic multisystem autoimmune disease with a peak incidence in females during childbearing years. SLE tends to be more severe in men, patients with younger age of onset, and specific genetic ancestries1,2,3,4. Multiple genes have been implicated in its pathogenesis, with > 90 genes/loci associated with predisposition to SLE5. Although the majority of SLE susceptibility variants across ancestral populations are in the same gene, the associated single-nucleotide polymorphisms (SNPs) may differ or convey different risks for the development of SLE6,7,8,9. SLE-associated genetic risk variants are located in both HLA and non-HLA regions, with alleles within the HLA region showing the strongest disease association8,10,11,12,13.
For polygenic diseases such as SLE, it is accepted that a genetic risk score (GRS) provides better information on the genetic contribution to the development of autoimmune diseases than investigating a single SNP14,15,16. There are 2 popular models for the calculation of polygenic risk scores: (1) a counting score that is a simple sum of risk and protective alleles present in an individual; and (2) a weighted score that takes into account the effect size of the SNP. Although there have been previous publications of polygenic risk scores in SLE, few examined if a risk score can be used as a predictor of age of disease onset and only 1 examined a non-HLA GRS over a large multiethnic population17–24. To our knowledge, this is the first study to examine the influence of HLA and non-HLA genes separately in a large combined pediatric and adult SLE cohort across a multiethnic population.
The aims of this study were to determine if there is an association between (1) GRS and age of SLE diagnosis for HLA genes, and (2) between GRS and age of SLE diagnosis for non-HLA genes.
METHODS
SLE cohort. This is a multicenter, multinational genetic study of patients with both childhood-onset SLE (cSLE) and adult-onset SLE (aSLE). cSLE was defined as SLE diagnosed at age < 18 years and aSLE as age of diagnosis of ≥ 18 years. Genomic DNA was collected from 2001 patients from 17 centers within North America and 1 from Europe (Supplementary Table 1, available with the online version of this article) who fulfilled > 4 of 11 American College of Rheumatology classification criteria for SLE25, with an age of disease diagnosis range of 1–82 years. The following clinical variables were available in 1979 of the 2001 patients: date of diagnosis, date of birth, age at diagnosis, and sex.
Ethics. The study was approved by the coordinating center (Hospital for Sick Children Research Ethics Board [REB] number 100007761). In addition, ethics was obtained from the REBs at each center, as were data and material transfer agreements. All patients gave informed consent for genetic and clinical data used in this study.
Genotyping. All of the 1979 DNA samples with clinical information were sent for genotyping using the Immunochip Illumina Infinium genotyping chip (Illumina, Inc.) at 1 of 2 centers: HudsonAlpha Genomics Services Laboratory and Cincinnati Children’s Hospital Medical Center-Harley Laboratory. Quality control (QC) of the genotype data was conducted using SNP & Variation Suite v8 software (Golden Helix Inc.) for each of the 4 different genotyping runs. Poor-quality samples with low call rates (< 95%), sample contamination, mixed samples or duplication, and close relatedness to another sample in the study were excluded. Relatedness between subjects was estimated by identity by descent analysis (IBDA). The total number of samples after QC was 1773/1979 samples genotyped.
SNP selection. By literature review of genome-wide association studies (GWAS), meta-GWAS9,26,27, and candidate gene and replication studies28,29,30,31, we found 106 SLE-associated SNPs, of which 93 SNPs were present on the Immunochip (Supplementary Table 2, available with the online version of this article). These 93 SNPs represented 39 different SLE-associated loci. All SNPs with a call rate < 0.99, minor allele frequency < 0.05, and Hardy-Weinberg Equilibrium P < 0.001 were excluded. After QC, there were 68 SNPs in 33 different loci; 58 in non-HLA areas and 10 in the HLA complex (Supplementary Table 3). Of the 13 SNPs that were not present on the Immunochip, we identified 4 proxies (all in the HLA region) using SNP Annotation and Proxy Search (SNAP) online tool version 2.2 (Broad Institute, Harvard University), which resulted in a total of 72 SNPs (58 non-HLA and 14 HLA). SNPs that were located in the same gene were tested for pairwise high linkage disequilibrium (LD) using the SNAP online tool. In the case of pairs of SNPs that met the threshold criteria (r2 > 0.8) in a specific gene for LD, the one with the lower OR reported in the literature was excluded. A further 14 SNPs were eliminated, leaving 58 independent SNPs (48 non-HLA and 10 HLA) from 33 SLE-susceptibility loci (Supplementary Table 3).
GRS calculation. ORs for all SNPs were classified as protective (OR < 1) or risk (OR > 1) by ancestral population (Supplementary Table 3, available with the online version of this article).
Four types of GRS were calculated for the statistical analysis: (1) genetic risk counting scores (GRCS), (2) genetic risk weighted scores (GRWS,) (3) genetic risk standardized counting score (GRSCS), and (4) genetic risk standardized weighted score (GRSWS). All scores were calculated separately for HLA and non-HLA SNPs and analyzed in the total population.
The GRCS was an additive genetic model based on the presence of the risk or protective allele and was determined as the sum of those alleles present in each individual (Supplementary Table 3, available with the online version of this article): a simple count of the risk alleles minus the protective alleles.
The GRWS accounts for the relative effect of each risk/protective allele by using the OR. The GRWS was calculated as the sum of the natural logarithm of the OR of each risk/protective allele present. The OR of a risk allele was positive and OR of a protective allele was negative (Supplementary Table 3, available with the online version of this article).
Considering the large variation in the number of HLA and non-HLA SNPs identified in each individual ancestral population, we standardized the maximum GRCS and GRWS to 10 for each population to produce standardized GRCS (GRSCS) and GRWS (GRSWS) using only the SNPs available for the individual ancestral population. This allowed for comparisons across ancestral populations in order to weight each population equally (Figure 1).
Determination of ancestral population. Principal component analysis (PCA) was used to determine the ancestral identity of each patient. We first ran the analysis in the whole population, comparing the first 2 principal components against reference samples of known ethnicities from the phase III International HapMap Project (HapMap3; www.sanger.ac.uk/resources/downloads/human/hapmap3.html). Samples that were outliers from the calculated clusters were dropped from the study. Multidimensional outlier detection (MOD) analysis was performed for each ancestral population individually until we did not detect any outliers. PCA/MOD analysis eliminated 131 patients, and a further 102 patients were eliminated as they were identified as having South Asian ancestry (an ancestry without any applicable gene studies). Therefore, the population analyzed was 1540.
Statistical analysis. Since sex and ancestral population are suggested to influence genetic susceptibility to SLE, we first determined the distribution of both across all age groups. We also determined the association of these factors with GRS and age of SLE diagnosis in each model. We used linear regression analysis to determine if GRS varied by age of SLE diagnosis for the whole cohort. As the age of diagnosis distribution did not follow a normal distribution, the natural logarithm of the age of diagnosis was used for statistical analysis as it followed a normal distribution.
Both age of SLE diagnosis (dependent variable) and GRS (predictor variable) were analyzed as continuous variables. A P value of < 0.05 was considered statistically significant. The percent variation in the dependent variable explained by the predictor is quantified using the adjusted r2 statistic in each model. The effects of sex and ancestral population were tested for interactions in the final model. SNP & Variation Suite v8 software (Golden Helix Inc.), R version 3.1.2 statistical package (R Foundation for Statistical Computing), and StatPlus:mac (AnalystSoft) were used for the statistical analyses.
RESULTS
Descriptive statistics. The analyzed cohort consisted of 1540 patients: 1351 females and 189 males; 1094 were of White (71.0%), 196 of Black (12.7%), 129 of Hispanic (8.4%), and 121 of Asian (7.9%) ancestry. Mean age of diagnosis in the total population was 25.3 years (SD 14.2) and median age was 21.0 years (Table 1). The mean age at diagnosis for females at 25.8 (SD 14.3) years was statistically significantly higher than that of males at 21.5 (SD 13.1, P = 4.16 × 10–5). The percentage of cSLE patients in the total male population (59.3%) was higher than the percentage of cSLE patients in the total female population (41.7%), with lower absolute number (112 males vs 563 females).
Non-HLA GRS. We initially determined the association of sex and ancestral population with GRS and age of SLE diagnosis. We found that for all non-HLA GRS, there were statistically significant associations between sex and age of SLE diagnosis. As a result, sex was included in our statistical model. There was a statistically significant association between sex and GRSWS (P = 0.013, regression coefficient = 0.180), but not between sex and the other non-HLA GRS. Significant associations were seen with age of diagnosis and specific ancestries for GRCS (P < 2 × 10–16 for White and P = 0.033 for Hispanic ancestry) and GRSCS (P < 2 × 10–16 for White and P = 0.001 for Black ancestry), whereas for GRWS and GRSWS, the only statistically significant association with age of diagnosis was in White ancestry (P < 2 × 10–16 for GRWS and GRSWS). Therefore, the final statistical model for GRCS and GRSCS included sex and all 4 ancestral populations while the final model for GRWS and GRSWS included sex and White vs non-White ancestry as covariates.
The final linear regression models for all non-HLA GRS, except the GRCS, showed a statistically significant negative association with age (Table 2). These models explained similar percentages of the variance of the genetic contribution (GRWS 17.8%, GRSCS 17.5%, and GRSWS 17.6%; Supplementary Figures 1–3, available with the online version of this article).
HLA GRS. There were statistically significant associations between sex and age of SLE diagnosis for all 4 HLA GRS. Therefore, sex was included in our statistical model. However, there were no statistically significant associations between sex and any of the 4 HLA GRS. All the ancestral populations showed statistically significant associations with age of SLE diagnosis for all 4 risk scores. However, in contrast to the results for non-HLA GRS, the final linear regression models of all 4 HLA GRS with age of SLE diagnosis, which included sex and all 4 ancestral populations as covariates, showed statistically significant positive associations with age (Table 3). All 4 models explained almost identical percentages of the variance of the genetic contribution (17.6% or 17.7%; Supplementary Figures 4–7, available with the online version of this article).
DISCUSSION
It has been suggested that the genetic contribution to the development of SLE likely differs between cSLE and aSLE. However, candidate gene studies have not identified any genes that were specific or unique to cSLE. No GWAS has been performed in cSLE, although a study has suggested that unique SNPs were found in a Korean cSLE population21,24,32. For these reasons, it was necessary to apply a different genetic approach to better understand the genetics of SLE across all ages. In our study, we used polygenic risk scores to better understand the genetic association with age of SLE diagnosis. Our data have suggested that there is a linear relationship between genetic risk and age of SLE diagnosis, and that HLA and non-HLA GRS influence age of SLE diagnosis differently.
HLA and non-HLA genes could play different roles in disease susceptibility due to their different degrees of relative risk for the development of SLE10,12 and therefore could have different effects in predicting the age of SLE onset. We found that for non-HLA SNPs, there was a negative association of GRS with age of SLE diagnosis (i.e., the higher the GRS, the younger the patient) and 18% of the variation in age of SLE onset was explained by our model. However, when HLA GRS were determined, there was a positive association of HLA SNPs with age of SLE diagnosis (i.e., the higher the GRS, the older the patient). Therefore, we have for the first time, to our knowledge, shown that HLA and non-HLA SNPs may contribute differently to the age of SLE diagnosis. This may explain why our findings appear to be different from previous studies that combined both HLA and non-HLA SNPs in determining the association of GRS and age of SLE onset17,18,21,24. We suggest that the contribution of non-HLA SNPs may be more important in the development of SLE in the younger patient, whereas the contribution of HLA SNPs maybe more important in the development of the disease in later years.
Previous publications have used different GRS (counting, weighted, or both) to predict the risk of multiple autoimmune diseases33,34,35. In SLE, there have been 7 publications that used polygenic risk scores to determine risk of SLE but only 4 examined them as a predictor of age of disease onset17,18,19,20,21,22,24. These 4 studies used either a counting score and/or weighted score17,18,21,24. We therefore took the approach to examine multiple GRS in order to determine which gave the most robust results. We found that GRWS were optimal. These scores were more robust than additive scores as they are not affected by sample size and strength of marker interactions, and they account for differences in effect size15.
Previous studies in SLE have shown conflicting results regarding sex and GRS. An initial study using a weighted score showed that men had a higher genetic risk than women (largely the result of HLA SNPs)19. However, a second study using a weighted score but smaller cohort and different HLA SNPs (only 13 SNPs were shared) did not replicate the finding22. Since the main difference between the 2 previous studies was the different HLA SNPs included, we covered all of the HLA SNPs used in both investigations (either with the same SNP or with an SNP in high LD). A small study of 75 cSLE patients, using only 7 SNPs and none in the HLA region, did not find a significant difference in the GRCS between the sexes21. We found a significant association between sex and age of SLE diagnosis in the analysis of all the GRS (HLA and non-HLA) for all ages; therefore, we used sex as a covariant because it was a potential confounder. When we controlled for the variation in sex between ages, only the non-HLA GRSWS showed a statistically significant association with sex: male patients showed an increase in their non-HLA GRSWS compared to female patients. Although this difference was low (regression coefficient = 0.180), it was statistically significant and replicated what was found in the largest study19. More investigations are needed to confirm these conclusions.
It is well-recognized that SNPs associated with susceptibility differ across ancestral populations6,7,37. In both cSLE and aSLE populations, SLE is more prevalent in non-White populations (Hispanic, Black, Native American, and Asian)36. Thus, in the calculation of the weighted score, each SNP was weighted according to its effect in the population studied. Moreover, when we analyzed the associations between age of SLE diagnosis and ancestral populations, we found statistically significant associations for all the GRS. Therefore, in the final linear regression models of all GRS with age of SLE diagnosis, we included ancestral populations as covariates. This is the first time that effects of genetic ancestry on GRS have been addressed, to our knowledge.
One limitation of our study may be that all genotyping was performed on the Immunochip platform. This platform was designed for use in European populations and therefore is less informative in other ethnic populations, with poorer coverage of SNPs associated with the development of SLE in non-White populations. In addition, there is increasing evidence that rare variants may be important in the development of SLE38,39, but only a few of these rare variants are present on the Immunochip. Although we were able to examine only 10/21 (47.6%) HLA SNPs validated in SLE GWAS, this is much greater coverage than in previous studies that used polygenic risk scores. Regarding the SNP coverage in each ancestral population, most of the non-HLA and HLA SNPs validated by GWAS and meta-GWAS for White and Asian populations were analyzed7,12,37,40–47. However, for Black and Hispanic populations, there were no published GWAS and therefore, candidate gene studies were used7,12,28,29,37,40–47. The resulting differences in the number of SNPs covered in each population were overcome by the process of GRS standardization, but it is still likely that some relevant SNPs in Black and Hispanic groups were missed. However, by standardizing the genetic score, the effect of any single locus will differ between ethnicities. Finally, there were no publications in South Asian populations and therefore, we could not include this population. However, our coverage of both HLA and non-HLA SNPs is the largest published to date in each ancestral population, to our knowledge. Another limitation of our study is the possibility of unmeasured confounders that can affect genetic scores and age of diagnosis of SLE. This could be the case in our models: because our dependent variable is age of SLE diagnosis and not age of SLE onset, there may be differential bias (e.g., time to diagnose in cSLE vs aSLE patients, males vs females; access to care, socioeconomic factors). However, we strongly believe that our results are still valid as a starting point for future investigations.
The present study is the first to show that there are different effects of non-HLA and HLA GRS on age of SLE diagnosis in a multiethnic population, to our knowledge. Specifically, non-HLA GRS showed that the higher the number of SLE-associated non-HLA SNPs, the younger the age of SLE diagnosis. Conversely, the higher the HLA GRS, the older the age of SLE diagnosis. These results were consistent across all methods of estimating their effect. We suggest that the non-HLA GRSWS may be the most robust score to use as it has shown the highest degree of statistical significance. However, for HLA risk, all the risk scores performed well. Overall, GRS explained 18% of the variance of age of SLE onset. We suggest that future studies use standardized GRS, which accounts for the variability in the distribution of the scores across populations, examines the effect of sex on risk scores, and determines the effect of HLA and non-HLA risk scores separately. These findings emphasize the complexity of the influence of genetic risk on the age of SLE onset.
ACKNOWLEDGMENT
We wish to acknowledge the contribution of samples and clinical information from the Atherosclerosis Prevention in Pediatric Lupus Erythematosus (APPLE) Investigators: Schanberg LE, Sandborg C, Barnhart HX, Ardoin SP, Yow E, Evans GW, Mieszkalski KL, Ilowite NT, Eberhard A., Imundo LF, Kimura Y, von Scheven E, Silverman ED, Bowyer SL, Punaro M, Singer NG, Sherry DD, McCurdy D, Klein-Gitelman M, Wallace C, Silver R, Wagner-Weiner L, Higgins GC, Brunner HI, Jung L, Soep JB, Reed AM, Provenzale J, and Thompson SD.
Footnotes
This work was funded by a grant from the Alliance for Lupus Research (number 417516) to EDS. JBH was supported by grants from the National Institutes of Health: AI024717 (R01), AI130830 (U01), and AI148276 (R01).
There are no competing interests for the principal investigator or any coinvestigators. There was no financial or other commercial support for any of the work associated with this manuscript in any way.
Full Release Article. For details see Reprints and Permissions at jrheum.org.
- Accepted for publication October 1, 2020.
- Copyright © 2021 by the Journal of Rheumatology
Free online via JRheum Full Release option
REFERENCES
DATA SHARING
Anonymized data will be available upon request to DD or EDS.