Abstract
Objective To evaluate whether a change in the European Alliance of Associations for Rheumatology (EULAR)/American College of Rheumatology (ACR) systemic lupus erythematosus (SLE) classification criteria threshold score affects accurate classification of SLE cases compared to disease-based control subjects. We evaluated a range of threshold scores to determine the score that maximizes the accurate classification of early SLE.
Methods We conducted a cross-sectional study comparing SLE cases and control patients. A EULAR/ACR criteria score was calculated using baseline information. Sensitivity, specificity, positive likelihood ratios (+LRs), and negative likelihood ratios (−LRs) with 95% CIs were used to evaluate operating characteristics. Threshold scores of 6 to 12 were evaluated in subjects with early disease (ie, disease duration of ≤ 5 years). +LRs > 10 and −LRs < 0.1 provide evidence to rule in or rule out SLE.
Results A total of 2764 patients were included: 1980 SLE cases who fulfilled either the ACR or Systemic Lupus International Collaborating Clinics criteria and 784 control subjects. The EULAR/ACR SLE criteria had a sensitivity of 98% (95% CI 97-98), a specificity of 99% (95% CI 98-100), a +LR of 95.5 (95% CI 48.0-190), and a −LR 0.03 (95% CI 0.02-0.03). The criteria operated well in those with early disease, in women, in men, and in White, Black, Chinese, and Filipino people. A score of 10 maximized the accurate classification of patients with early disease (+LR 174.4, 95% CI 43.8-694.6; −LR 0.03, 95% CI 0.02-0.04). An increase in the threshold score from 10 to 11 resulted in significant worsening in the −LR (threshold score 10: −LR 0.03, 95% CI 0.02-0.03 vs threshold score 11: −LR 0.05, 95% CI 0.04-0.06).
Conclusion The EULAR/ACR SLE classification criteria threshold score of 10 performs well, particularly among those with early disease and across sexes and ethnicities.
Systemic lupus erythematosus (SLE) is a heterogeneous disease characterized by immune activation, target organ inflammation, and damage.1 Classification criteria are used to identify homogeneous groups of patients for inclusion into observational studies and trials.2 The reduction in disease heterogeneity improves the ability to make valid inferences within studies and improves generalizability across studies.2,3 The European Alliance of Associations for Rheumatology (EULAR)/American College of Rheumatology (ACR) classification criteria for SLE4,5 were developed with a balanced use of data-driven and expert-based consensus methods.6-10
The EULAR/ACR SLE classification criteria constitute a defined system that produces a measure of the relative probability that a particular case (ie, a combination of clinical features) has SLE. This system is comprised of an additive point system with hierarchical clustering of clinical and serologic features. Antinuclear antibody positivity (titer of ≥ 1/80 on HEp-2 cells or an equivalent test) is required as an entry criterion. A score of 10 is the threshold above which experts would classify cases as SLE for the purpose of research studies. Each criterion has been defined carefully to ensure appropriate face/content validity and to improve the reliability of application (ie, precision). As part of the development process, this system was validated against a large number of cases, including many cases that are not clear-cut SLE. In the validation cohort, this system had 96% sensitivity and 94% specificity compared to other disease-based controls.11,12
Independent external validation is an expectation of both EULAR and ACR of all criteria sets and is the final requirement after endorsement. Published validation studies with only SLE cases are limited as they only report sensitivity.13-15 Control subjects are required for estimation of specificity and negative predictive value. In addition, there have been calls to revise the criteria or the threshold of 10 for classification.16-18 Cui and colleagues17 opined that “rheumatologists should be informed of exact probability of illness in patients with underlying SLE who are below the threshold (ie, total score <10) so as to provide better decision-making, evaluation and follow-up.” Rheumatologists and researchers should use classification criteria with the proper level of confidence in categorization derived from not only sensitivity and specificity, but also the positive likelihood ratio (+LR) and negative likelihood ratio (−LR). Finally, the ability to accurately identify patients with early disease is important so that these patients may have access to earlier intervention and trials of innovative therapies with the goal of preventing damage or other ominous outcomes.19
The primary objective of this study was to evaluate whether a change in the threshold score would affect accurate classification of SLE compared to disease-based control subjects. Our secondary objectives were to evaluate whether a change in threshold score would affect accurate classification in subsets of patients with SLE, specifically early disease, across sexes and ethnicities.
METHODS
Subjects. We conducted a cross-sectional study across multiple clinics at University of Toronto–affiliated hospitals. The University of Toronto Lupus Cohort encompasses an inception cohort (ie, joined the clinic within the first 12 months from SLE diagnosis) and prevalent patients (ie, joined the clinic after 12 months from SLE diagnosis). All patients with SLE met the revised 1997 ACR classification criteria for SLE, or they met 3 criteria and had a supportive skin or kidney biopsy.20,21 Consecutive control subjects from the general and specialty rheumatology clinics of Toronto Western Hospital and Mount Sinai Hospital, Toronto, Canada, were included. No additional inclusion or exclusion criteria (eg, age ranges and presence or absence of specific diagnoses) were used. All patients with systemic sclerosis (SSc) met the ACR/EULAR classification criteria for SSc.22 All other diagnoses were physician based.
Data collection. All clinical, patient-reported, and serologic data were obtained from the patient chart and electronic medical record. Standardized abstraction forms were used to collect sex, disease duration, ethnicity, diagnosis, clinical manifestations, and serology. Clinical manifestations were attributed to SLE if there was no more likely alternative explanation.4,8 For each patient, the EULAR/ACR criteria score was calculated based on the baseline clinical, laboratory, and renal biopsy information. The baseline information was obtained from the first 2 visits—both visits occurring within 1 to 4 months—as some of the tests ordered at the first visit were recorded only at the second visit. Data were entered into a computerized database. Data quality was maintained using logic and data entry checks.
Analysis. Descriptive statistics were used to summarize the data. Sensitivity, specificity, +LR, and −LR for the EULAR/ACR classification criteria were estimated with 95% CIs. CIs for +LRs and −LRs are based on formulas provided by Simel et al.23
The operating characteristics of threshold scores of 6 to 12 were evaluated for the total cohort; in subjects with early disease, defined as a disease duration of 5 years or less; and across sexes and ethnicities. LRs above 10 and below 0.1 were considered to provide strong evidence to rule in or rule out diagnoses. An LR close to 1 suggests the criteria were of little value. The optimal threshold is one that maximizes the +LR and minimizes the −LR. Receiver operating characteristic (ROC) curves were used to illustrate the performance of the EULAR/ACR SLE classification criteria in the full cohort, in those with early disease, in men, in women, and in patients of White or Black ethnicity.
Ethics. University Health Network Research Ethics Board (REB) approval was obtained prior to the conduct of this study (REB No. 17-5926).
Patient and public involvement. Patients were involved with data collection and will be involved in choosing which results to share, when to share results, and in what format study results will be disseminated to the relevant wider patient communities.
RESULTS
Patients. We included 2764 patients with 1980 SLE cases who fulfilled either the ACR or Systemic Lupus International Collaborating Clinics (SLICC) criteria and we included 784 controls. Out of 2764 patients, 427 (15%) were male. Control subject diagnoses included systemic autoimmune rheumatic diseases (ie, rheumatoid arthritis, systemic sclerosis, and Sjogren syndrome), infections, metabolic diseases, and malignancies (Table 1). The mean disease duration was 4.49 (SD 6.3) years for SLE cases and 8.50 (SD 8.4) years for control subjects at the assessment visit. A total of 1727 (87%) out of 1980 SLE cases and 610 (78%) out of 784 control subjects were female. The ethnicities of the 1980 SLE cases were White (n = 1276, 64%), Black (n = 270, 14%), Chinese (n = 193, 10%), Filipino (n = 70, 3.5%), and First Nations (n = 17, 1%). Ethnicities among the 784 control subjects were White (n = 584, 74%), Black (n = 44, 6%), Chinese (n = 42, 5%), Filipino (n = 14, 1.8%), and First Nations (n = 3, 0.4%; Table 2). The rate of occurrence of the individual classification criteria attributes in the cases and control subjects are reported in Table 3.
Operating characteristics. The EULAR/ACR SLE classification criteria had a sensitivity of 98% (95% CI 97%-98%), a specificity of 99% (95% CI 98%-100%), a +LR of 95.5 (95% CI 48.0-190), and a −LR of 0.03 (95% CI 0.02-0.03). The ROC curve for the full cohort is illustrated in Figure 1, with an area under the curve (AUC) of 0.9964. In patients with early disease, defined as 5 years or less duration, the EULAR/ACR SLE classification criteria had a sensitivity of 97% (95% CI 96%-98%), a specificity of 99% (95% CI 98%-100%), a +LR of 174.40 (95% CI 43.78-694.64), and a −LR of 0.03 (95% CI 0.02-0.04). The ROC curve, with an AUC of 0.9974, for patients with early disease is illustrated in Supplementary Figure S1 (available from the authors upon request). In women, the EULAR/ACR SLE classification criteria had a sensitivity of 98% (95% CI 97%-98%), a specificity of 99% (95% CI 98%-100%), a +LR of 99.1 (95% CI 44.7-219.8), and a −LR of 0.03 (95% CI 0.02-0.03). The ROC curve for female patients, with an AUC of 0.9963, is illustrated in Supplementary Figure S2 (available from the authors upon request); the ROC curve for male patients, with an AUC of 0.9961, is illustrated in Supplementary Figure S3 (available from the authors upon request). The ROC curve for White patients, with an AUC of 0.9955, is illustrated in Supplementary Figure S4 (available from the authors upon request); the ROC curve for Black patients, with an AUC of 0.9997, is illustrated in Supplementary Figure S5 (available from the authors upon request). The sensitivity, specificity, +LR, and −LR results for the full cohort, for those with early disease, across sexes, and across ethnicities confirmed the accurate performance of the EULAR/ACR SLE criteria and are reported in Table 4.
The LRs for threshold scores of 6 to 12 are reported in Table 5. By balancing both sensitivity and specificity through maximizing the +LR and minimizing the −LR, a threshold score of 10 maximizes the accurate classification of patients with SLE with early disease (+LR 174.40, 95% CI 43.78-694.64; −LR 0.03, 95% CI 0.02-0.04), women (+LR 99.13, 95% CI 44.71-219.80; −LR 0.03, 95% CI 0.02-0.03), and men (+LR 84.94, 95% CI 21.41-336.96; −LR 0.02, 95% CI 0.01-0.05). An increase in the threshold score from 10 to 11 for the full cohort does not result in a statistically significant improvement in the +LR (threshold score 10: +LR 95.57, 95% CI 47.96-190.44; vs threshold score 11: +LR 106.17, 95% CI 50.78-221.99). In contrast, an increase in the threshold score from 10 to 11 for the full cohort results in a statistically significant worsening in the −LR (threshold score 10: −LR 0.03, 95% CI 0.02-0.03; vs threshold score 11: −LR 0.05, 95% CI 0.04-0.06).
DISCUSSION
This study independently validates the operating characteristics of the EULAR/ACR classification criteria for SLE. We first confirmed the accuracy of the classification criteria in identifying patients with SLE compared to disease-based control subjects. Second, we demonstrated that these criteria work well in early disease, across sexes, and across ethnicities. Most importantly, we demonstrated that a threshold score of 10 is the optimal score for accurate classification, particularly in early disease. To increase the threshold score would increase the risk of misclassification.
LRs are considered superior methods for statistical modeling and making inferences about test performance. LRs are more informative than sensitivity or specificity because they consider both sensitivity and specificity simultaneously. Unlike predictive values, LRs are not affected by disease prevalence. The magnitude of the LR gives intuitive meaning as to how strongly a test result will raise (rule in) or lower (rule out) the likelihood of a disease. Cui and colleagues17 inquired about the probability of illness in patients who have signs or symptoms suggestive of, but not diagnostic of, SLE and a classification criteria score of less than 10. Others may suggest a higher threshold score to increase specificity.16 Our reporting of LRs across a range of thresholds is informative and may address these concerns. The pretest odds of a diagnosis, multiplied by the LR, determines the posttest odds based on the Bayes theorem.24 When the pretest probability lies between 30% and 70%, test results with very high +LRs (eg, above 10) rule in disease. A very low −LR (eg, below 0.1) rules out that the patient has the disease. Our findings of small −LRs indicate that failure to fulfill the classification criteria at a threshold of 10 makes the posttest probability of having SLE very small, in the case of a patient with signs or symptoms suggestive of, but not diagnostic of, SLE. The high +LR indicates that the classification criteria perform well for a population. The lower estimates of the EULAR/ACR SLE criteria specificity may have been underestimated because of incomplete use of the attribution rule of the criteria, under which items should only be counted for SLE if there were no more likely alternative explanations.25,26 Readers, however, are cautioned: classification criteria are not designed for diagnosis and should not be used as justification to withhold appropriate treatment.3
A strength of this study is the large number of cases and controls. This allowed for good precision around our estimates. The criterion of “arthritis” has been noted to be problematic across all iterations of SLE classification criteria, as patients with SLE can have an erosive arthritis.27 Our control group contained several forms of arthritis, including rheumatoid arthritis, and the operating characteristics of the classification criteria remained strong. Similarly, our control group contained several dermatologic conditions, single dominant manifestations that can be seen in SLE cases and in nonrheumatic disease controls. A limitation of our study is the number of ethnicities represented. External validation is needed across other ethnicities, including South Asian, First Nations/Indigenous, and others. Another limitation to consider is the lack of a gold standard for SLE. Ideally, classification criteria should be compared against a gold standard. However, SLE is a disease without a single diagnostic test or gold standard. All SLE cases in this study fulfilled either the ACR or SLICC criteria. However, comparing one system of classification to another can be a limitation. For example, if a patient is classified as having SLE based on the EULAR/ACR criteria but not on the modified 1997 ACR classification, it remains uncertain whether the EULAR/ACR criteria is a false positive or whether the 1997 ACR criteria missed a true SLE case. In the absence of a true gold standard, alternative strategies could include physician diagnosis or consensus methodology as the gold standard. Physician diagnosis can be a problematic gold standard, as the construct underlying SLE varies across individuals and centers.4,28 Using consensus methods for case ascertainment is labor intensive and is biased toward established, clear-cut disease. Indicating that all SLE cases in this study fulfilled either the ACR or SLICC criteria facilitates comparison of our findings to other studies that use the same criteria.2 Comparison of the new criteria against the ACR and SLICC criteria was required by both the ACR and EULAR for consideration of endorsement.4,5
Operating characteristics are a critical feature of classification criteria. Our reporting of LRs demonstrates the value of the criteria across important patient subgroups and further validates the threshold score of 10. The operating characteristics of the EULAR/ACR classification criteria are robust, reducing the risk of misclassification.
Footnotes
This study was, in part, supported by the Schroeder Arthritis Institute, University Health Network, Toronto, Canada.
The authors declare no conflicts of interest relevant to this article.
- Accepted for publication October 13, 2022.
- Copyright © 2023 by the Journal of Rheumatology
This is an Open Access article, which permits use, distribution, and reproduction, without modification, provided the original article is correctly cited and is not used for commercial purposes.