Abstract
Objective. To evaluate and compare the utility of commonly used outcome measures for assessing disease exacerbation or flare in patients with rheumatoid arthritis (RA).
Methods. Data from the Dutch Potential Optimalisation of (Expediency) and Effectiveness of Tumor necrosis factor-α blockers (POET) study, in which 462 patients discontinued their tumor necrosis factor-α inhibitor, were used. The ability of different measures to discriminate between those with and without physician-reported flare or medication escalation at the 3-month visit (T2) was evaluated by calculating effect size (ES) statistics. Responsiveness to increased disease activity was compared between measures by standardizing change scores (SCS) from baseline to the 3-month visit. Finally, the incremental validity of individual outcome measures beyond the Simplified Disease Activity Score was evaluated using logistic regression analysis.
Results. The SCS were greater for disease activity indices than for any of the individual measures. The 28-joint Disease Activity Score, Clinical Disease Activity Index, and Simplified Disease Activity Index performed similarly. Pain and physician’s (PGA) and patient’s global assessment (PtGA) of disease activity were the most responsive individual measures. Similar results were obtained for discriminative ability, with greatest ES for disease activity indices followed by pain, PGA, and PtGA. Pain was the only measure to demonstrate incremental validity beyond SDAI in predicting 3-month flare status.
Conclusion. These results support the use of composite disease activity indices, patient-reported pain and disease activity, and physician-reported disease activity for measuring disease exacerbation or identifying flares of RA. Physical function, acute-phase response, and the auxiliary measures fatigue, participation, and emotional well-being performed poorly.
- RHEUMATOID ARTHRITIS
- TUMOR NECROSIS FACTOR INHIBITORS
- DISEASE ACTIVITY
- REMISSION
Disease activity cannot be directly measured in rheumatoid arthritis (RA) because of the absence of a gold standard measure of the inflammatory process of the disease. Consequently, numerous standardized measures of various symptoms and signs of the disease and measures of global disease effect have been proposed and validated over time to assess beneficial effects of treatment. Early clinical trials in this field were characterized by many different outcome measures and results were frequently difficult to compare1. To address this, a set of 7 outcome domains was recommended independently by the American College of Rheumatology (ACR) and the World Health Organization/International League against Rheumatism2,3. Other efforts to standardize outcomes across studies have led to the development and validation of several composite disease activity indices. Each of these combines a number of the endorsed outcomes to produce an overall disease activity score4,5,6. Responsiveness and discriminative ability are key properties of such measures and the performance of commonly used disease activity indices, as well as individual core set measures in detecting treatment effects and discriminating between different levels of achieved therapeutic response, have been extensively described in previous studies7,8,9,10,11,12,13,14.
While these previous studies have provided a great deal of insight into which of the commonly applied instruments are most suitable for measuring treatment benefits in RA, not much is known about their performance when assessing exacerbation of disease activity or flares. This is of importance, however, because the occurrence of flares or disease worsening are increasingly adopted as endpoints of clinical trials15. Moreover, there is a need for instruments that can be used to monitor disease exacerbation in clinical practice16.
It has been proposed by the Outcome Measures in Rheumatology flare working group that RA flare represents worsening of disease activity that would, if persistent, in most cases lead to initiation or change of therapy, and a flare represents a cluster of symptoms of sufficient duration and intensity as to require initiation, change, or increase in therapy17. Moreover, an extended set of outcomes for the assessment of flare was endorsed, which includes several besides the 7 core outcomes, and is intended to comprehensively cover the experience from the patient’s perspective18,19. However, a factor to consider in addition to the relevance of outcome domains, according to experts and patients, is that the assessment of flares or disease worsening should proceed using measurement instruments that are optimally valid and reliable for this purpose to minimize flare status misclassifications (i.e., false-positives and false-negatives) and nonoptimal sensitivity to change because of random and systematic measurement errors.
In our present study, we compared the relative efficiency of a series of clinical and patient-reported measures using data from the Dutch Potential Optimalisation of (Expediency) and Effectiveness of Tumor necrosis factor-α blockers (POET) study to identify measurement instruments and outcome domains that most reliably assess disease exacerbation. The primary objective was to compare the performance of commonly used disease activity measures and indices, as well as recently endorsed patient-reported outcome (PRO) domains in assessing disease activity worsening. A secondary objective was to evaluate the incremental validity of individual outcomes over the traditional disease activity indices.
MATERIALS AND METHODS
POET study
POET was an open-label pragmatic randomized controlled trial conducted at 47 rheumatology departments in the Netherlands. To be eligible for the study, patients had to be 18 years of age or older, meet the 1987-revised ACR criteria for the classification of RA20, have been treated for at least 1 year with concomitant tumor necrosis factor-α inhibitor (TNFi) and conventional synthetic disease-modifying antirheumatic drugs (DMARD), and be in remission or stable low disease activity for at least 6 months. Remission or stable low disease activity were defined as either a 28-joint Disease Activity Score (DAS28)21 < 3.2 measured at least twice or the rheumatologist’s clinical impression of remission or stable low disease activity in combination with at least 1 C-reactive protein (CRP) level < 10 mg/l in the 6 months prior to inclusion. There were no exclusion criteria. Study inclusion took place from March 2012 to March 2014. The study was approved by the Institutional Ethical Review Boards of all participating hospitals (grant/award number 40-00506-98-12001). Written informed consent was obtained from each patient upon study entry. The study was conducted in adherence to the Good Clinical Practice guidelines and with regulatory requirements consistent with the Declaration of Helsinki. The study is registered in the Netherlands Trial Register (number NTR3112).
Measurements
Baseline characteristics included age, sex, disease duration, medication use, and rheumatoid factor and anticyclic citrullinated peptide antibody status. Patients were evaluated at baseline and at least once every 3 months thereafter up to 1 year by the attending rheumatologist and rheumatology nurse in accordance with current Dutch treatment guidelines for RA. Clinical measurements, which are part of standard rheumatology care, were performed at every visit and included the tender joint count (TJC) in 28 joints, the swollen joint count (SJC) in 28 joints, the erythrocyte sedimentation rate (ESR), CRP, and the physician’s (PGA) and patient’s global assessment (PtGA) on a 0–100 visual analog scale (VAS). Additionally, physician-reported flares and changes in medication were recorded at each scheduled or unscheduled visit.
The following PRO were also administered at each study visit. Fatigue was assessed using the Bristol RA Fatigue Multi-Dimensional Questionnaire and 0–10 numerical rating scales assessing fatigue severity and effect of fatigue on daily functioning. Patient-reported well-being, disease activity, and pain were assessed using 0–100 VAS. The Health Assessment Questionnaire-Disability Index (HAQ-DI) was administered to evaluate disability. Finally, the Medical Outcomes Study Short Form-36 (SF-36) was administered to assess health-related quality of life. We used the role physical, role emotional, and social functioning scale to assess participation. The bodily pain, physical functioning, and vitality scales were used to assess pain, disability, and fatigue, respectively. The Clinical Disease Activity Index (CDAI), Simplified Disease Activity Index (SDAI), and DAS28-ESR scores were calculated using their respective standard scoring algorithms4.
Statistical analysis
For our posthoc analysis, data were used from 462 patients randomized to stopping TNFi treatment. Standardized effect size (ES) statistics with pooled SD were obtained to compare the ability of various measures to discriminate between patients with and without flare at the 3-month assessment (T2). The first flare anchor we examined was physician-reported flare at T2. Medication escalation was also evaluated as an anchor of flare, defined as starting or increasing any biological or non-biological DMARD (including glucocorticoids) at T2.
Responsiveness was evaluated by comparing standardized change scores (SCS), again with pooled SD from baseline to T2. The magnitude of ES and SCS was interpreted according to the guidelines provided by Cohen (i.e., trivial < 0.20, small ≥ 0.20, moderate ≥ 0.50, and large ≥ 0.80)22. The utility of individual measures to predict flare status at T2 was investigated using univariable logistic regression analysis. For each domain, the outcome measure with the largest ES in the analysis of discriminant validity was selected to represent that domain. Predictive strength was quantified using R2 as an effect size estimator. Guidelines for interpreting R2 were again derived from Cohen: small ≥ 0.01, moderate ≥ 0.09, and large ≥ 0.2022. The incremental validity of outcome domains not included in the SDAI was evaluated using multivariable hierarchical logistic regression analysis with flare status at T2 as the dependent variable. CDAI was entered as a first block and individual outcome measures were entered as a second block. Incremental validity was evaluated using ΔR2.
Multiple imputation by chained equations was performed to replace missing values. The imputation model included the dependent variable: physician-reported flare status (yes/no), the evaluated outcome measures, as well as age, followup time, and total number of flares during the total study period. Twenty datasets with imputed plausible values were obtained, with 200 iterations between datasets. Rubin’s rules were used to obtain pooled variable estimates and their associated standard errors. Pooled analyses will be reported throughout our paper.
RESULTS
Baseline characteristics are presented in Table 1. The sample was characterized by established disease, with long disease duration, and 62.6% of patients with erosive disease. In accordance with the inclusion criteria, disease activity according to DAS28 was low at baseline, as was physical disability according to HAQ-DI.
Discriminant validity
While 39.4% of patients had a medication escalation at T2, only 18.7% of patients were reported to have a clinical flare at T2 by their rheumatologist. A consistently greater contrast in scores (i.e., ES of greater magnitude) was observed for clinical flare compared with medication escalation across all measures. The clinical composite scores performed best and interchangeably, followed by pain and PGA. Bodily pain assessed using the SF-36 was consistently the best performing individual measure with ES of moderate magnitude, followed by PGA and PtGA of disease activity and well-being. Measures of physical aspects of patient-reported health and fatigue had ES of small to moderate magnitude. Measures of emotional well-being consistently had ES of trivial magnitude (Figure 1).
Responsiveness
A moderate increase in disease activity according to all 3 disease activity indices (SCS ≈ 0.50) was observed for the stop group. Individual DAS28 components showed SCS of trivial (ESR) to small magnitude (TJC, SJC, well-being). PGA of disease activity was the only individual measure with SCS of moderate magnitude. Patient assessment of disease activity and both measures of pain had SCS of small magnitude, but outperformed each of the individual DAS28 components, while the SF-36 vitality and social function scales had SCS of small magnitude and smaller than the DAS28 components. Each of the remaining measures had an SCS of trivial magnitude (Figure 2).
Incremental validity
In univariable analysis, all outcomes except emotional well-being were significantly associated with clinical flare status and all outcomes except emotional well-being and fatigue were significantly associated with medication escalation at T2. Of the measures included in the SDAI, CRP was consistently most weakly associated with flare status in univariable analysis, while physician’s global performed best. Of the measures not included in the SDAI, pain was consistently most strongly associated with 3 months. Pain and PGA were performed best overall. According to the analysis in the binary logistic regression models with only SDAI, R2 was of small magnitude for medication escalation (R2 = 0.09) and large magnitude for clinical flare (R2 = 0.26). In Table 2, only pain contributed significantly to the classification of flare status beyond the SDAI. However, the magnitude of its independent effect was small.
DISCUSSION
In our present study, the performance for assessing disease exacerbation was compared between several validated disease activity indices and individual measures that either assessed 1 of the core set variables or 1 of the domains that have been proposed as outcomes for the assessment of flare in RA.
Overall, the composite disease activity indices outperformed individual measures across the various comparisons. This was unsurprising because standardized change scores of individual measures are more affected by measurement error compared with composite scores, because random error tends to cancel itself out when information from multiple measures is combined. Nevertheless, the disease activity indices proved to be noticeably more efficient indicators of disease exacerbation compared with most individual measures, which further supports the practice of using composite tools for assessing disease activity in RA.
Interestingly, despite their different constituent measures and scoring protocols, the CDAI, SDAI, and DAS28 performed equivalently as measures of disease exacerbation in our present study. Similar sensitivity to improvement was observed as well for these 3 tools in 2 previous studies23,24 and no clear differences in other measurement properties were observed in a comparative systematic review25. When measuring disease activity exacerbations, it seems that the main advantages of DAS28 are that it has been the most frequently used measure in previous studies, providing a richer frame of reference for the interpretation of study outcomes against historical results, and that it is currently the only clinical composite score for which validated flare criteria are available26. However, scoring for the DAS28 is more complex than for SDAI and CDAI because of the differential weighting of individual components. Further, acute-phase reactants were shown in our study to provide little additional information on disease activity exacerbation and were among the weakest univariable predictors of flare status. Because there is usually a delay in the availability of laboratory results, the inclusion of acute-phase reactants in both the SDAI and DAS28 may create a logistical barrier to obtaining disease activity scores in real time for apparently redundant information from a statistical point of view.
PGA and PtGA of disease activity and pain assessed using either VAS or the SF-36 bodily pain scale were the most sensitive individual measures of increased disease activity in the POET study. It is worth mentioning that changes in the amount of pain a patient reports were more predictive than other variables measured in our study of treatment decisions (medication escalation) and physician-reported flare. Pain intensity was also the only outcome domain that provided unique information beyond the information provided by the SDAI in the analysis of incremental validity, even though pain is indirectly represented in both the TJC and very likely PtGA, which were both controlled for in our analysis. Previous studies have repeatedly established that pain is the most important treatment priority for patients throughout the disease course27. These results suggest that more comprehensive evaluation of overall disease status of a patient could be obtained using a clinical composite score that includes pain.
The overall performance ranking of specific measures in our study corresponds largely to that observed in previous studies that have compared the ability of different measures to assess disease improvement. Only the poor performance of physical function contrasts with previous studies where it has consistently been found to be one of the best performing indicators of treatment benefits7,8,9,10,11,12 and increased disease activity in an observational study28. These contrasting results are likely explained by the long average disease duration and the high prevalence of patients with erosive disease at baseline in POET, which previous studies have found to be associated with decreased responsiveness when measuring physical function in patients with RA29. It is commonly believed that while inflammatory disease activity is the most important determinant of physical function in RA, structural damage increasingly contributes to disability later in the disease course, effectively lowering the ceiling on the amount of improvement that can be realized30. Our results provide further support for this notion and suggest that inferences regarding disease exacerbations from (lack of) change in physical function scores should be made with caution in populations with longstanding disease.
Little evidence was found in our present study to support an extended set of outcomes for measuring flares. PRO other than pain and PtGA were found to be weakly sensitive to disease flare and did not provide incremental information above and beyond the SDAI when predicting flare status. These results suggest that the addition of fatigue, emotional well-being, or participation to a composite score may not contribute much to its reliability or predictive validity for measuring flares or exacerbated disease. Previous studies also found limited responsiveness for measures of participation and emotional well-being, while the performance of fatigue has been mixed6,7,8,9,10,11. An explanation for this may be that participation restrictions and emotional well-being are complex, integrated domains that may be determined by disease activity as well as comorbidities, and personal and environmental factors.
Strengths of our study are that the POET study was designed to evaluate exacerbation of RA resulting from TNFi discontinuation in patients with longstanding stable low disease activity and that physician-rated flares were assessed per protocol, while previous studies are posthoc analyses of subsets of deteriorated patients from studies originally designed to evaluate patients expected to improve (e.g., clinical trials) or remain stable (observational studies)28,31,32,33,34. A limitation of our study is that there was relatively little variability of scores for some measures (e.g., SJC) because of the homogeneous, low level of disease activity at the start of our study. Different results might have been achieved in patients with more severe disease at baseline. However, practical measurement settings in which disease worsening is to be assessed most likely involve patients starting in a state of low disease activity (e.g., tapering of medication in daily clinical practice or clinical trials).
The results of our study support the use of traditional disease activity measures and composite indices for assessing flare or disease exacerbation. Pain, PGA, and PtGA of disease activity were the best performing individual measures, while the composite DAS28, CDAI, and SDAI performed equivalently as measures of exacerbation. Patient-reported measures assessing the domains of participation, fatigue, and emotional functioning performed worst. Based on these findings, we recommend that assessment of disease flares should proceed using core set measures, preferably combined, particularly pain intensity.
- Accepted for publication April 7, 2017.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.