Abstract
Objective. Rapidly predicting future outcomes based on short-term clinical response would be helpful to optimize rheumatoid arthritis (RA) management in early disease. Our aim was to derive and validate a clinical prediction rule to predict low disease activity (LDA) at 1 year among patients participating in the Treatment of Early Aggressive Rheumatoid Arthritis (TEAR) trial escalating RA therapy by adding either etanercept or sulfasalazine + hydroxychloroquine [triple therapy (TT)] after 6 months of methotrexate (MTX) therapy.
Methods. Eligible subjects included in the derivation cohort (used for model building, n = 186) were participants with moderate or higher disease activity [Disease Activity Score 28-erythrocyte sedimentation rate (DAS-ESR) > 3.2] despite 24 weeks of MTX monotherapy who added either etanercept or sulfasalazine + hydroxychloroquine. Clinical characteristics measured within the next 12 weeks were used to predict LDA 1 year later using multivariable logistic regression. Validation was performed in the cohort of TEAR patients randomized to initially receive either MTX + etanercept or TT.
Results. The derivation cohort yielded 3 prediction models of varying complexity that included age, DAS28 at various timepoints, body mass index, and ESR (area under the receiver-operator characteristic curve up to 0.83). Accuracy of the prediction models ranged between 80% and 95% in both derivation and validation cohorts, depending on the complexity of the model and the cutpoints chosen for response and nonresponse. About 80% of patients could be predicted to be responders or nonresponders at Week 12.
Conclusion. Clinical data collected early after starting or escalating disease-modifying antirheumatic drug/biologic treatment could accurately predict LDA at 1 year in patients with early RA. For patients predicted to be nonresponders, treatment could be changed at 12 weeks to optimize outcomes.
Predicting future clinical outcomes based upon baseline factors or early clinical response would be useful to help optimize management of rheumatoid arthritis (RA). The information might guide selection of specific biologic agents, or allow for rapid switching to more effective therapies based upon a patient’s early response. While factors measured at baseline would be best to predict future treatment response, there are currently no commonly used clinical, genetic, or biomarker-based predictors that can adequately predict future clinical or radiographic response for large numbers of heterogeneous patients with RA, to guide drug selection or meaningfully inform patient management1,2,3,4.
In the absence of baseline factors that can predict future response at an individual patient level, models have therefore focused on predicting remission or low disease activity at 1 year using data collected early in the course of treatment [e.g., within 12 weeks or earlier after initiating a new anti-tumor necrosis factor (TNF) agent]5,6. Other prediction models have focused mainly on the subgroup of patients predicted to be nonresponders later in time on the basis of a lack of early response7,8. However, most work related to prediction models has focused on patients with established RA, and there is no certainty that these prediction models would perform adequately in patients with early RA.
Our objectives therefore were to derive and validate a clinical prediction rule to predict low disease activity (LDA) at 1 year in a large U.S. cohort of patients with early RA who were participating in the Treatment of Early Aggressive Rheumatoid Arthritis (TEAR) trial9. They were randomized to add either etanercept or sulfasalazine (SSZ) plus hydroxychloroquine (HCQ) and had moderate or higher disease activity despite 6 months of methotrexate (MTX).
MATERIALS AND METHODS
Overall description
Detailed methods on the TEAR trial have been published9. Briefly, TEAR was an investigator-initiated, randomized double-blind study using a 2-by-2 factorial design resulting in 4 treatment arms: immediate treatment with (1) MTX + etanercept; (2) MTX + SSZ + HCQ [triple therapy (TT)]; or initial MTX, with step-up treatment if 28-joint Disease Activity Score-erythrocyte sedimentation rate (DAS28-ESR) was ≥ 3.2 at Week 24 to (3) MTX + etanercept; or (4) TT. For the purpose of this posthoc analysis, the 2 initial MTX arms who received step-up treatment at Week 24 were combined and used to derive the prediction model. The model was applied in a separate validation cohort composed of the 2 immediate treatment arms, treatment arms 1 and 2 above, to assess the robustness of the model in an independent sample and to ensure its generalizability to different RA treatment regimens.
Derivation cohort for prediction model
To derive the prediction model, we identified TEAR participants with moderate or higher disease activity (DAS28 > 3.2) despite 24 weeks of MTX monotherapy who were adding either etanercept or SSZ + HCQ at Week 24. Receipt of etanercept or TT was both randomized and blinded.
Among these individuals, described throughout as the derivation cohort (i.e., training dataset), data collected within the 12-week period from Week 24 through Week 36 of the TEAR Trial were evaluated as predictors of low disease activity (LDA, defined as a 4-variable DAS28-ESR ≤ 3.2) measured about 1 year later (i.e., Week 72 of the trial). For the purposes of this analysis, Week 24 was considered the baseline, because it was at that time when patients were considered to have failed MTX monotherapy and received step-up therapy. For the 11 participants with no outcome data 48 weeks later because of withdrawal from the trial, we conservatively imputed their outcome as nonresponse (i.e., they did not achieve LDA).
Validation cohort of prediction model
Because of the potential that any prediction model derived using 1 set of data would not perform as well when applied to new data, validation of the prediction model created using the derivation cohort was assessed in an independent group of participants in the TEAR trial. This validation cohort (i.e., testing dataset) included TEAR participants randomized to start MTX + etanercept or TT at baseline. Their baseline characteristics have been described9. The outcome of interest for the validation cohort was LDA assessed 1 year after initiating these combination RA treatment regimens (i.e., at Week 48 in the trial). The treatment arms were not the same in the validation cohort as in the derivation cohort (derivation cohort had failed to reach LDA despite 6 months of MTX monotherapy and then escalated care with either etanercept or triple therapy; validation cohort initiated these same combination therapies together at the start of TEAR). However, the validation cohort allowed us to test the hypothesis that the prediction model would adequately predict clinical response about 1 year later among patients escalating their RA therapy, with less concern for the specific RA treatment regimen they were using.
As a separate validation step (although not a completely independent population), all TEAR patients originally randomized to MTX monotherapy in the 2 step-up arms were examined during the first 6 months of the trial (when they were receiving MTX monotherapy). The best-performing prediction model created in the derivation cohort was examined for its accuracy in predicting response (i.e., LDA) to MTX monotherapy at 6 months using predictors measured within the first 12 weeks.
Statistical analysis
Multivariable logistic regression was used to derive a clinical prediction rule that was scaled to approximate the likelihood of response ranging from 0–1 (i.e., 0–100%). Factors initially considered for this model were based on a priori clinical interest, review of the literature (e.g., body mass index10), availability of the data, and exploratory analyses that had already been conducted in TEAR11. Final model selection and associated variables within each model were chosen based upon Wald’s global statistic and goodness-of-fit tests, Akakie’s Information Criteria (AIC) and the c statistic, similar to an area under the receiver-operator curve12,13. The weights from the logistic models were used in this formula:
RESULTS
Characteristics of the 186 patients with RA in the derivation cohort who received MTX monotherapy for 24 weeks but did not achieve LDA at Week 24 are shown in Table 1. Time-varying factors such as disease activity were represented at Week 24 when these patients added either etanercept or SSZ + HCQ. About 70% of patients were women, and most were seropositive for rheumatoid factor or anticitrullinated protein antibodies (one of the study inclusion criteria, along with presence of baseline radiographic erosions). Mean DAS28 was 4.9, mean Health Assessment Questionnaire (HAQ) score was 1.1, and mean ESR was at the upper limit of normal.
Characteristics of the derivation cohort used for model-building, which included the 186 TEAR participants taking methotrexate for 24 weeks who added either etanercept or triple therapy (sulfasalazine + hydroxychloroquine added to methotrexate). All factors were measured at the time etanercept or triple therapy was added (i.e., 24 weeks into the TEAR trial) except for body mass index and seropositivity, which were measured at baseline. Proportions may not sum to 100% because of rounding.
After assessing multiple factors listed in Table 1 at this timepoint (Week 24 of the TEAR trial) and factors through the next 12 weeks (Week 36 of the trial) to predict LDA 48 weeks later (i.e., Week 72 of the trial) the following key predictors were identified using the Wald’s test statistic: age, body mass index (BMI), DAS28-ESR at various timepoints (at the time of receiving step-up therapy at Week 24, and 6 and 12 weeks later), and ESR. Using these variables in various combinations, 3 final prediction models were derived, and all satisfied goodness-of-fit tests. Model A included age, DAS28 at Week 24 (when etanercept or TT was added), 6, and 12 weeks later, and yielded a c statistic = 0.80. Model B included the same predictors as model A and added BMI and had a c statistic = 0.82. Model C was the best performing model and included the same predictors as model B and added ESR at Week 24, with a c statistic = 0.83.
Weights for the predictor variables are shown in Table 2, and an example of how the weights could be used to derive a predicted likelihood of response is described. All other factors listed in Table 1 (e.g., sex, modified HAQ, seropositivity) were assessed but were not predictive of response. Interaction terms between the treatment arm (etanercept vs SSZ + HCQ) and the key predictor variables were not significant.
Measurement estimates of models to predict low disease activity at 1 year from the derivation cohort1.
The accuracy of the prediction models in relation to the cutpoint chosen from the prediction rule is shown in Table 3. Accuracy up to 95% could be obtained, depending on the cutpoints chosen for response and nonresponse. Using the derivation cohort, for example, and using a cutpoint for predicted nonresponse of < 0.2, model accuracy for patients predicted to be nonresponders was 93% and applied to 23% of the TEAR participants. Similarly, model accuracy for predicted responders varied according to the cutpoint chosen; accuracy was 87% for predicted responders using a cutpoint of > 0.7 and applied to 25% of the TEAR population. In general, the accuracy of the prediction model was greater for participants predicted to be nonresponders than for participants predicted to be responders. Overall, if 80% accuracy was considered satisfactory and therefore a cutpoint was chosen of < 0.4 for predicted nonresponders and > 0.6 for predicted responders, a total of 79% of patients could be predicted (i.e., 45% of the population would be predicted to be nonresponders, and an additional 34% of patients would be predicted to be responders). The remaining 21% of patients had an uncertain likelihood of response at 12 weeks, with predicted values in the range between 0.4 and 0.6. Results were similar in the validation group, with accuracy slightly lower for predicted nonresponders and slightly higher for predicted responders. Discrimination in the validation cohort was similar to the derivation cohort, yielding c statistics for models A, B, and C of 0.81, 0.81, and 0.79, respectively.
Cutpoints and accuracy for nonresponders and responders in the derivation and validation cohorts in TEAR using best prediction model (Model C). The derivation cohort was composed of patients taking methotrexate (MTX) monotherapy for 24 weeks who added either etanercept, or SSZ + HCQ because they had moderate or high disease activity at Week 24. The outcome (low disease activity) was assessed 48 weeks later (Week 72 of the TEAR trial). The validation cohort was composed of patients who initiated MTX + etanercept, or triple therapy (MTX + SSZ + HCQ) at the beginning of the TEAR trial. The outcome (low disease activity) was assessed 48 weeks later (Week 48 of the TEAR trial).
Given the interest in understanding how well the prediction model would perform to predict LDA among patients initiating MTX monotherapy, the prediction models were applied to all TEAR patients in those 2 arms (i.e., the original 2 TEAR MTX monotherapy arms, including the subset of patients who went on to need step-up therapy 6 months later), and using the outcome of LDA at 6 months. The accuracy of the best-performing prediction model (Model C) was 93% (59/64 patients achieving LDA at 6 months) for the 47% of patients predicted to be nonresponders by Week 12 using a prediction cutpoint of < 0.2. There were too few patients predicted to be responders (using any cutpoint for response) to MTX using this model to assess its accuracy.
Using the combined derivation and validation cohort, the tradeoff between greater prediction accuracy and the proportion of the TEAR participants for whom prediction was possible was depicted visually for nonresponders (Figure 1A) and responders (Figure 1B). As demonstrated, there was a clear, inverse relation between the accuracy of prediction and the proportion of the TEAR participants who could be predicted to be responders or nonresponders.
A. Performance of 3 prediction models using data collected within 12 weeks of escalating treatment for predicted nonresponders using combined derivation and validation cohorts (n = 313 total). Cum. NPV: cumulative negative predictive value. B. Performance of 3 prediction models using data collected within 12 weeks of escalating treatment for predicted responders using combined derivation and validation cohorts (n = 313 total). Cum. PPV: cumulative positive predictive value.
DISCUSSION
In a clinical trial of patients with early RA, clinical data collected early after starting or escalating disease-modifying antirheumatic drug (DMARD) treatment could predict LDA at 1 year with high accuracy (up to 95%). The greater the accuracy demanded of the prediction model, the fewer patients who could be predicted with that level of accuracy. To achieve at least 85–90% accuracy in the response prediction by Week 12, about 60% of patients could be predicted to be responders or nonresponders. Allowing for somewhat lesser accuracy (e.g., 75–80%) of prediction, about 80% of patients could be predicted to achieve LDA 1 year later. For patients predicted to be nonresponders with very high accuracy, treatment could be changed at 12 weeks to optimize outcomes. For the 20–25% of patients for whom an accurate prediction could not be made by Week 12, further research will be needed to identify other factors that could aid in predicting treatment response. It is possible, for example, that additional information such as synovial power Doppler signal using musculoskeletal ultrasound or a biomarker-based approach to disease activity assessment14 could aid in making early treatment decisions.
Several studies have analyzed the role of the patient’s baseline characteristics as predictors of response. Some of these have found that HAQ15, disease duration16, and baseline disease activity6 play an important role in achieving LDA or remission in the long term. The therapies received by the patients with RA in these studies varied; some populations received only nonbiologic DMARD and others received combination therapy with biologics and nonbiologic DMARD. Some factors that seemed to relate to future treatment responses were baseline characteristics such as sex, age, and smoking15,17,18. Moreover, smoking was previously examined in TEAR and was not correlated with treatment response11. Like others10, we did find an association between BMI and treatment response. However, these factors are generally not informative in isolation to manage individual patients.
For that reason, we examined predictors of clinical response measured shortly after initiation of new RA treatments. Our results are consistent with past analyses that have examined early response as a predictor of later response. In a pooled analysis of treatment data with MTX, anti-TNF monotherapy, and the combination of MTX and anti-TNF therapy, disease activity during the first 3 months of treatment was significantly associated with disease activity at 1 year19. Keystone, et al7 showed that patients who responded by weeks 6 and 12 had better clinical, radiographic, and patient-reported outcomes at Week 52 compared with Week 12 (but not Week 6 responders, or Week 12 nonresponders). Additional studies also have shown that initial or short-term response to treatment is a strong predictor of future response and outcomes20,21. Early response to treatment has also been shown to be a predictor of longer-term response at 5 years in the CAMERA study22. In CAMERA, both disease activity and radiographic progression were significantly lower in the early European League Against Rheumatism good responders compared with early moderate or nonresponders. Limitations to most of these analyses have generally been that only single predictors measured at 1 timepoint were considered; in some analyses, only a small fraction of the overall population could be predicted with reasonable accuracy. Using methods similar to ours that allowed for examination of multiple predictors, Ma, et al23 developed a prediction model of RA remission (DAS28-ESR < 2.6) among patients with early RA receiving nonbiologic DMARD. As in our study, they found that age was independently associated with 24-month remission. Though they did not consider DAS28 as a predictor, they found that only tender joint count at baseline was associated with remission at 24 months. However, they did not take into consideration initial response to treatment, and discrimination of their prediction model (c statistic 0.70–0.71) was somewhat lower than ours.
The strengths of our study include reasonably large numbers of patients with very early RA participating in a randomized, blinded, investigator initiated trial. We also examined our model’s validity using an independent set of participants from other arms of the TEAR trial. Our study may be limited in that the validation population from the other treatment arms in TEAR was not initiating the identical treatment regimen as our derivation cohort, which likely lowered the performance of our prediction models in the validation sample. However, it did allow us to consider whether the prediction model performed adequately across 2 different (but frequently used) combination RA treatment regimens. This issue is important because a clinical prediction model would be most useful if it were not unique to only 1 RA treatment regimen but rather if it could be applied to multiple commonly used regimens that patients with early RA might receive. We had limited statistical power to detect whether a prediction model derived separately for the addition of etanercept rather than SSZ + HCQ might have performed better.
We recognize that despite use of relatively straightforward statistical methods (i.e., logistic regression) to create a prediction score, a calculator or computer program likely would be required to apply the prediction score in routine practice. Given that a calculator is used to find the DAS28-ESR, and the increasing use of health information technology tools in clinical medicine, this requirement may not be very burdensome. We selected our target outcome as LDA using the DAS28 at 1 year, a disease state that is consistent with treat-to-target guidelines24 commonly used in RA clinical trials, and the endpoint at 6 months in TEAR that allowed patients to step up their therapy. However, we recognize that despite its common use in RA clinical trials, the DAS28 may not be as easily calculated as other measures of RA disease activity such as the Clinical Disease Activity Index (CDAI)25, given the need for an acute-phase reactant, which may not be available in real time at the office visit. Considering the choice of this outcome, we anticipated that the performance of any prediction model that used the DAS28 as a predictor variable would be somewhat superior to other disease activity predictors. Additional work to consider other target outcomes such as the CDAI at 1 year and using other predictor variables including patient-reported outcomes may be useful. Finally, TEAR collected only limited information on specific comorbidities, which precluded us from including these in the prediction models.
This prediction model derived in patients with very early RA participating in TEAR predicted LDA at 1 year for a meaningful number of patients. Clinical and laboratory data included in the prediction rule yielded an accuracy in the 80–95% range for between two-thirds and three-quarters of the patients with early RA. Augmenting this type of model with genetic or other biomarker-based data in the future, especially for patients for whom clinical factors yield uncertainty in predicting response by Week 12, will likely allow for better prediction and optimize RA management.
Footnotes
-
Support from Amgen. A US National Institutes of Health (NIH) planning grant from the US National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) supported the initial phases of the TEAR study. Dr. Curtis receives support from the NIH (AR 053351) and the US Agency for Healthcare Research and Quality (R01 HS018517) and has received research grants and/or done consulting for Amgen, Abbott, BMS, Crescendo, CORRONA, Genentech, Janssen, Pfizer, and UCB. Dr. Ranganath also receives support from NIH (NIAMS K23 AR057818) and research grants and/or does consulting for BMS and UCB. Dr. Mikuls has received a research grant from Roche.
- Accepted for publication December 20, 2012.