Abstract
Objective. We examined agreement between the American College of Rheumatology (ACR), European League Against Rheumatism (EULAR), and Simplified Disease Activity Index (SDAI) response criteria in rheumatoid arthritis (RA) and tested whether discordant responses were associated with patients’ baseline characteristics or changes in RA activity encapsulated by the different criteria.
Methods. In a prospective longitudinal study, we examined responses of 243 patients with active RA to escalation of antirheumatic treatment. We computed agreement between pairs of response criteria using κ coefficients and identified patient characteristics associated with unique responses to individual criteria.
Results. We found that 110 patients (45.3%) had an ACR 20% improvement (ACR20) response, 135 (55.5%) had a EULAR moderate/good response, and 83 (34.1%) had an SDAI50 response. Agreement was moderate to good (ACR20/EULAR κ 0.57; ACR20/SDAI50 κ 0.64; EULAR/SDAI50 κ 0.59). All who had SDAI50 response also had a EULAR response. Patient characteristics at baseline generally did not distinguish those who responded to both, 1, or neither criterion. Discordance was most often because of improvements in the erythrocyte sedimentation rate or C-reactive protein level among EULAR and SDAI50 responders, which were not as common among ACR20 responders. Based on receiver-operating characteristic curves, SDAI35 response had a better balance of sensitivity and specificity relative to ACR20 and EULAR moderate/good responses than SDAI50.
Conclusion. Discordant responses to RA improvement criteria are most often because of differences in responses of acute-phase reactants. SDAI35 response had higher sensitivity for improvement, as reflected by other response criteria, than SDAI50 response.
Response criteria are key measures in the evaluation of medication efficacy because they provide benchmarks for patient improvement. In rheumatoid arthritis (RA) clinical trials, the American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) response criteria have been widely accepted1,2. The ACR20 response criteria require that a threshold of 20% improvement be met in both tender and swollen joint counts and in 3 of 5 additional RA activity measures. The ACR criteria are at once restrictive, because of the joint count requirements, and permissive, allowing some measures to remain unchanged or even worsen as long as 3 other measures meet the improvement threshold. EULAR response criteria are based on changes in the Disease Activity Score (DAS), and relate to both the degree of improvement in the DAS and the residual level of RA activity. Response criteria have also been proposed for the Simplified Disease Activity Index (SDAI)3. These criteria are based on percentage changes in the SDAI and were developed by mapping changes in the SDAI to ACR responses3.
Given the different components, formats, and requirements of these response criteria, it is possible that patients who are classified as responders by 1 criterion may not be classified as responders by another criterion. For example, patients with large relative changes in RA activity may satisfy the ACR criteria but not the EULAR criteria if the posttreatment requirement of the latter are not met. Alternatively, patients with major changes in pain, joint tenderness, and acute-phase reactants may not meet the ACR criteria if joint swelling did not improve sufficiently. Each criterion set relies to some extent on coordinate changes among different RA activity measures. However, because the ACR criteria consider each activity measure individually, it may be affected more than EULAR and SDAI responses if changes are discordant among measures. Conversely, large improvements in some component measures of the DAS or SDAI can compensate for negligible changes or worsening in other components.
Discordant responses may result not only from differences in the structure of the response criteria, but also from differences in the sensitivity of the criteria to the pretreatment level of RA activity. For example, ACR responses may be more achievable for patients with lower pretreatment levels of RA activity, because small changes (for example, a decrease from 10 to 8 tender joints) might satisfy the percent improvement requirements.
In clinical trials, EULAR moderate/good responses have often been met more commonly than ACR20 responses4,5,6,7,8,9. Investigating discordant responses may help explain this pattern of results. Of 4 studies that examined the concordance of ACR and EULAR responses at the individual patient level, agreement as measured by κ coefficients was 0.26, 0.54, 0.61, and 0.63, suggesting substantial agreement in some studies but fair agreement in others10,11,12,13. Agreement between ACR and SDAI responses were similar11,12,13,14. Previous studies have not compared the composition of concordant and discordant groups to determine whether patients with particular characteristics account for the discrepant responses, or whether differences result solely from the way in which the response criteria are constructed. The aim of our study was to test the concordance of ACR, EULAR, and SDAI response criteria in patients with RA, and to determine whether pretreatment characteristics or the profile of responses to the component RA activity measures favored satisfaction of 1 set of criteria over another.
MATERIALS AND METHODS
Participants
We enrolled adults with active RA in a prospective observational study, with the main goal of determining clinically important changes in RA activity measures15. We defined active RA by the combined presence of physician judgment of active RA, ≥ 6 tender joints, and escalation of RA treatment with disease-modifying medications, prednisone, or biologics because of active synovitis at the baseline visit. The specific treatment was decided by the patient’s rheumatologist and not determined by the study. Of 262 patients enrolled, 250 patients completed the study. We excluded 7 patients who were missing values for the erythrocyte sedimentation rate (ESR) at any visit, leaving 243 patients for analysis. The study was approved by the US National Institute of Arthritis and Musculoskeletal and Skin Diseases institutional review board (03-AR-0133), and all participants provided written informed consent.
Study procedures
Participants completed a baseline visit and a followup visit either 4 months later (if treatment escalation involved a disease-modifying medication or biologic) or 1 month later (if escalation was prednisone). The timing of the followup visits differed based on the expected time of responses to these medications. At both visits, we performed joint counts (66 swollen, 68 tender) and scored the physician’s global assessment [PGA; on a 0–100 visual analog scale (VAS)]. The same rheumatologist examined the patient and performed the joint counts on both visits. We tested the ESR and C-reactive protein (CRP) level, and administered written questionnaires to obtain the patient’s global assessment (PtGA; 0–100 VAS, with higher scores indicating more active RA), pain score (0–100 VAS, with higher scores indicating greater pain severity), and Health Assessment Questionnaire–Disability Index (HAQ-DI; 0–3, with higher scores indicating more functional limitations).
Statistical analysis
We computed ACR, EULAR, and SDAI responses from changes in the relevant measures between the baseline and final visits. Either acute-phase reactant could be used in ACR responses. DAS28-ESR was used for EULAR responses. We examined the concordance between each pair of criteria at 2 levels of response, following previous studies3,11,13: ACR20, EULAR moderate/good (hereafter EULAR), and SDAI50 (i.e., 50% decrease in SDAI) as first-level responses; and ACR50, EULAR good, and SDAI70 (i.e., 70% decrease in SDAI) as second-level responses. We measured percent agreement as the percent of patients categorized as responders or nonresponders by both criteria. We measured concordance using κ coefficients. Conventional interpretation of the κ coefficient is < 0, poor agreement; 0–0.20, slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate agreement; 0.61–0.80, substantial agreement; and 0.81–1.00, near perfect agreement16.
To assess predictors of combined responses to pairs of criteria (e.g., ACR20 and EULAR), we categorized patients in 4 mutually exclusive groups based on whether they met neither, only 1, or both response criteria in the pair. We tested whether demographic features and levels of arthritis activity at baseline differed among the 4 groups using ANOVA. This analysis indicated whether any baseline characteristics suggested a preferential subsequent response to 1 criterion versus another. To examine correlates of discordant responses, we repeated this analysis using changes in the arthritis activity measures during our study as the dependent variable. We used log-transformed values in the analysis of CRP, and converted these back to natural units for presentation. This analysis indicated whether differences in the composition of the response criteria were associated with preferential responses to 1 criterion versus another.
In a more direct assessment of the discordant groups, we computed relative effect sizes between changes among those with responses to only 1 criterion (e.g., ACR20) and those with only responses to the alternative criterion (e.g., EULAR) using Hedges’s g, which is the standardized mean difference of the mean change in 1 group versus the mean change in the second group, with standardization based on the pooled SD of the 2 means17. This measure is analogous to Cohen’s d but is more appropriate for small samples. Hedges’s g provides an indication of whether a particular feature was selectively larger in either discordant subgroup. Absolute values of Hedges’s g between 0.5–0.8 are considered medium effects, and those > 0.8 as large effects18. We considered effects that were at least medium to be clinically meaningful.
We also tested whether values other than a 50% decrease in the SDAI were more closely associated with ACR20 and EULAR moderate/good responses, using receiver-operating characteristic (ROC) curves. The percent change in SDAI that had the highest sensitivity and specificity for either an ACR20 response or a EULAR response (i.e., the point on the ROC curve closest to the 0, 1 corner of the plot) was determined.
Analyses were performed using SAS version 9.3 (SAS Institute) and R programs.
RESULTS
Patient characteristics and treatment responses
Patients were predominantly middle-aged women with a mean DAS28 of 6.1 and mean SDAI of 38.5 at baseline (Supplementary Table 1, available from the authors upon request). Treatment escalation at the baseline visit included an increased dose of disease-modifying medication in 100 patients (41.1%), initiation of a new disease-modifying medication or biologic in 89 patients (36.6%; 60 with methotrexate, 20 with tumor necrosis factor-α inhibitors, and 9 with other medications), and initiation of prednisone in 54 patients (22.2%).
At the followup visit, 45.3% had an ACR20 response, 55.5% had a EULAR response, and 34.1% had an SDAI50 response. Additionally, 22.6% of subjects had an ACR50 response, 11.1% had a EULAR good response, and 17.8% had an SDAI70 response. All patients were eligible to register a EULAR good or moderate response, based on their baseline DAS28 scores.
Concordance of responses
Of 149 patients who had either an ACR20 or EULAR response, 96 patients had both responses, 14 patients had only an ACR20 response, and 39 patients had only a EULAR response (Table 1). Agreement between ACR20 and EULAR responses was moderate, with κ of 0.57. Agreement was slightly higher between ACR20 and SDAI50 responses. In this cohort, there were no patients with an SDAI50 response who did not also have a EULAR response. Isolated EULAR responses were present in higher proportions of patients than isolated ACR20 or SDAI50 responses, suggesting that the EULAR response was somewhat more permissive. Seventy-four patients (30.5%) met all 3 response criteria.
Concordance was moderate to substantial between second-level responses (Table 1). Because of the low frequency, we did not analyze predictors of differential second-level responses.
Predictors and correlates of ACR20/EULAR discordance
Patients who met both criteria tended to be younger and have shorter durations of RA than nonresponders (Table 2). RA activity at baseline was comparable among response groups, except for slightly higher levels of pain among those who responded to both criteria. There were graded responses in all RA activity measures, with the largest responses in those who met both criteria, and intermediate responses among those who met only 1 criterion.
Those with EULAR-only responses had substantially larger improvements in both the ESR and CRP than those with ACR20-only responses, based on Hedges’s g measures of relative effect size (Table 2). Mean ESR and CRP worsened in the ACR20-only responders. In addition, improvement in the patient PtGA was somewhat higher among EULAR-only responders.
To identify further the role of individual measures in the discordance, we examined the proportion of patients with 20% responses in the ACR core measures among EULAR responders (Table 3). Among EULAR responders who also met ACR20 criteria (n = 96), 99% had ≥ 20% improvement in the PGA, and 91% had 20% improvement in pain. The measures least likely satisfied were improvements in the ESR (74%) and CRP (65%). In other words, 26% of patients met ACR20 criteria despite lacking a 20% improvement in ESR, and 35% did so despite lacking a 20% improvement in CRP. Among EULAR responders who did not meet the ACR20 criteria (n = 39), the pain score, HAQ, and CRP were the measures with the lowest response frequencies (Table 3).
Eight of the 14 patients who had an ACR20 response but not a EULAR response had a decrease in DAS28 < 0.6, while 4 patients had a decrease in DAS28 > 0.6 but did not meet the final DAS28 state criterion. These 14 patients most commonly had improvement of 20% or more in the PGA (75%) and pain score (92%), and least commonly in the ESR (42%) or CRP (33%).
Predictors and correlates of ACR20/SDAI50 discordance
This comparison showed associations similar to the ACR20/EULAR comparison: shorter RA duration was associated with responses to both criteria; little difference among response groups in baseline RA activity (except for CRP), and graded responses in RA measures among those with neither, 1, or both responses (Table 4). Among discordant responders, those with SDAI50-only responses had substantially larger improvements in the tender joint count and CRP than ACR20-only responders, while ACR20-only responders had substantially larger improvements in pain scores, based on Hedges’s g.
Among SDAI50 responders who also met the ACR20 criterion (n = 75), all had ≥ 20% improvement in the PGA, while 20% improvement in the PtGA, pain score, and HAQ were seen in 88%, 91%, and 88%, respectively (Table 3). Twenty-seven percent of patients met the ACR20 response criteria without achieving a 28% improvement in ESR, while 32% did so without achieving a 20% improvement in CRP. Among SDAI50 responders who did not meet the ACR20 criteria (n = 8), one-half or fewer had 20% responses in the pain score, HAQ, and acute-phase reactants (Table 3).
Among the 35 patients who had an ACR20 response but not a SDAI50 response, 23 had a decrease in SDAI of ≥ 10 points, and 5 had a decrease of more than 20 points. The ACR20 criteria were met in these patients through responses to the PGA in 88% of patients, and to the PtGA, pain score, HAQ, ESR, and CRP in 70%, 91%, 64%, 64%, and 48%, respectively.
Predictors and correlates of EULAR/SDAI50 discordance
Because all patients with an SDAI50 response also had a EULAR response, we were unable to identify baseline characteristics or changes in RA activity measures that were unique to each criterion. Compared to nonresponders, patients who met both criteria were younger and had a shorter duration of RA, following the pattern of other responses (Table 5). These patients also had larger mean improvements in RA activity measures than those who did not meet either criterion or those who only met the EULAR criterion. These results indicate that the EULAR criterion was more permissive than SDAI50 in categorizing patients as improved. The SDAI improved > 40% in 23 of the 52 sole EULAR responders, and > 45% in 9 sole EULAR responders, indicating that a number of these patients were close to meeting the SDAI50 criterion.
Association of alternative SDAI improvements with ACR20 and EULAR responses
We tested whether improvements in the SDAI at levels other than 50% would have closer agreement with ACR20 and EULAR moderate/good responses, using ROC curves (Figure 1). In this analysis, the change in SDAI with the optimal combination of sensitivity and specificity for association with the ACR20 was 35%, with a sensitivity of 0.87 and specificity of 0.82 (compared to 0.68 and 0.94 for SDAI50). The corresponding change in SDAI for association with the EULAR moderate/good response was 34%, which had a sensitivity of 0.87 and specificity of 0.93 (compared to 0.61 and 1.00 for SDAI50). Areas under the ROC curves were high, indicating good discrimination. The κ between SDAI35 and ACR20 responses was 0.67, and between SDAI35 and EULAR moderate/good responses was 0.79.
DISCUSSION
Our study has 3 main findings. First, the pattern of discordant responses revealed a hierarchy of stringency among the 3 RA response criteria, with the EULAR moderate/good criterion being the most permissive, SDAI50 the most strict, and ACR20 intermediate. Second, patient characteristics at baseline were largely not associated with differential responses. Third, when EULAR and SDAI50 responses were present in the absence of ACR20 responses, it was often because of greater responses in the acute-phase reactants.
More patients had a EULAR moderate/good response than an ACR20 or SDAI50 response, corroborating previous studies that reported higher frequencies of EULAR responses than ACR20 responses4–13. Discordance was commonly because of patients who had EULAR responses without an ACR20 or SDAI50 response. Sole SDAI50 responders were also uncommon when compared with ACR20 responses, indicating that the SDAI50 was stricter in categorizing improvement than the other criteria. These results indicate that the response criteria, at least for first-level responses, can be ranked in order of permissiveness as EULAR moderate/good, ACR20, and SDAI50.
Agreement between ACR20 and EULAR moderate/good responses in our study was comparable to that in prior studies10,11,13. The 1 study that reported poorer agreement (κ = 0.27) did not examine patients at the start of a new treatment and included worsening as a response category, which may have altered the estimate of agreement12. Our results on ACR20 and SDAI50 agreement were also comparable to prior studies, which reported κ of 0.67, 0.37, and 0.5411,12,13. Similarly, our agreement between EULAR and SDAI50 responses matched prior studies, in which κ ranged from 0.54 to 0.7111,12,13,14. It is natural to compare these particular responses because they represent the minimum definition of improvement for each measure. However, this should not necessarily imply that the same patients should be identified as improved by these criteria.
Our first consideration in investigating the origin of discordant responses was whether particular types of patients were more likely to register responses to 1 criterion versus another. Nonresponders were older and had longer RA duration than responders, perhaps because they received less aggressive treatment escalation19. However, baseline features generally did not distinguish patients who were more likely to respond to 1 criterion versus another.
The second, and the anticipated more likely, explanation for discordant responses was the structure of the response criteria. The main difference was related to acute-phase reactants, and secondarily to patient-reported outcomes. Improvement in acute-phase reactants was less likely than improvement in other measures among ACR20 responders. Responses of acute-phase reactants were the least commonly satisfied elements among EULAR and SDAI50 responders who were not also ACR20 responders. Lack of sufficient improvement in the PtGA, pain score, and HAQ was common among patients who were categorized as improved by the EULAR or SDAI50 criteria but not by the ACR20. Conversely, these were among the elements most highly satisfied in ACR20 responders. Together, these results suggest that some patients achieve an ACR20 response because responses on the patient-reported measures compensate for the lower likelihood of an acute-phase reactant response.
Evaluation of changes in the sole EULAR responders may provide some insight into whether the EULAR criterion is too permissive or the SDAI50 criterion is too strict. These patients had, on average, > 30% improvement in the tender and swollen joint counts, PGA, and PtGA. Forty-three percent of these patients had improvement in the SDAI of ≥ 40%. These findings suggest that many patients with substantial clinical improvement were not categorized as improved by the SDAI50. In our cohort, a 35% improvement in the SDAI was more comparable to both the ACR20 and EULAR moderate/good responses than a 50% improvement. SDAI35 had lower specificity for improvement relative to ACR20 and EULAR moderate/good criteria than the SDAI50, but much higher sensitivity.
The strengths of our study include examination of a cohort with active RA undergoing treatment escalation, detailed examination of discordant responses among the 3 improvement criteria, and testing concordance with different cutoff points of SDAI improvement. Our study is limited in having a relatively small number of patients in the discordant groups. However, we were able to detect features that differentiated among responders.
These results can aid in the interpretation of RA clinical trials. ACR20 responses are somewhat more sensitive to improvement in patient-reported measures such as pain score and HAQ, while EULAR responses tend to be more broad-based, likely because improvement in some features of the DAS can compensate for other features that do not improve as much. SDAI50 was the most stringent response criterion. Modification of this criterion to a 35% improvement in SDAI aligned better with the ACR20 and EULAR moderate/good criteria, and this should be tested in other cohorts.
Footnotes
This study was supported by the Intramural Research Program, NIAMS, NIH (ZIA-AR041153), and the US Public Health Service grant RO1-AR45177.
- Accepted for publication December 27, 2017.