Abstract
Objective. Patient-reported outcomes are used in clinical practice and trials. We studied a large clinical practice to determine the minimally important difference (MID) estimates for (1) the Health Assessment Questionnaire–Damage Index (HAQ-DI): improvement and worsening using patient global assessment anchor; and (2) pain using a patient-reported pain anchor.
Methods. Patients with rheumatoid arthritis (RA; N = 225) had clinic visits at 2 timepoints within 1 year, completed the HAQ-DI and pain visual analog scale (VAS; 0–100 mm), and answered the question, “How would you describe your overall status/overall pain since the last visit?”, as much worsened, somewhat worsened, the same, somewhat improved, or much improved. If rated as somewhat improved or worsened, they were defined as the minimally changed subgroups.
Results. Eighty-three percent were women, mean age 60 years, with disease duration 11.7 ± 10.7 years. The baseline HAQ-DI was 0.97 ± SD 0.76, and at followup 1.0 ± 0.77 (mean change +0.03 ± 0.40). The baseline pain VAS was 42.3 ± 28.8, and at followup 38.5 ± 27.9 (mean change −2.8 ± 25.9). The mean (SD) HAQ-DI change score was −0.09 (0.42) for somewhat improved and 0.15 (0.33) for somewhat worsened. The HAQ-DI change for somewhat/much better was −0.20 ± 0.52, and for somewhat/much worse +0.21 ± 0.33. For pain, somewhat improved changed by −11.9 mm on the VAS, and somewhat worsened by 6.8 mm. Estimates for HAQ-DI and pain were larger than the for no-change group, 0.03 (0.32) and −3.2 (20.9).
Conclusion. The MID for HAQ-DI in clinical practice is smaller than it is in trials. This may have implications for observational studies and clinical care.
- MINIMALLY IMPORTANT DIFFERENCE
- HEALTH ASSESSMENT QUESTIONNAIRE-DISABILITY INDEX
- RHEUMATOID ARTHRITIS
- RANDOMIZED CONTROLLED TRIALS
The Stanford University Health Assessment Questionnaire-Disability Index (HAQ-DI) is a musculoskeletal-targeted self-report tool that assesses the functional status for performing activities of daily living; it is scored from 0 (no disability) to 3 (severe disability), representing the averaging of the worst score in each of 8 domains of daily function1. It is the most widely used functional outcome measurement in randomized controlled trials of rheumatoid arthritis (RA). The HAQ-DI is feasible, reliable, and valid for clinical trials in RA and is sensitive to clinically relevant changes in function2. Higher scores (> 1.0) predict worse outcomes3.
Minimally important difference (MID) is the smallest difference in a measure that patients perceive as a change4. It is the smallest change in a score that would be clinically relevant, and MID are useful for determining sample sizes for future studies. In order to determine the MID, an external anchor such as a physician or patient global assessment is used. The MID may reflect either an improvement or a worsening. The minimal clinically important improvement is defined as the smallest change in measurement that signifies an important improvement. A related concept is the Patient Acceptable Symptom State (PASS), which has been defined as the highest level of symptom beyond which patients consider themselves well5. Changes of 10% or roughly 10 on a 100 mm visual analog scale (VAS) correspond to MID for patient-reported measures across many studies6.
In cross-sectional studies of patients with RA, the MID for HAQ-DI improvement was −0.19 to −0.237,8, whereas it was +0.497 for worsening. In RA clinical trials the MID in HAQ-DI improvement ranged from −0.22 to −0.247,9,10. In trials, patients perceive worsening sooner than they do improvement, causing the MID for deterioration to be smaller than those for improvement11. This could also occur in clinic patients hoping to “not deteriorate,” or in other words an expectation bias may be present.
We prospectively studied a large clinical practice at a university hospital to determine (1) the MID estimates for HAQ-DI for improvement and worsening (i.e., the minimally clinically important improvement and the minimally clinically important deterioration) using a patient global assessment anchor; and (2) the MID estimates for pain (for improvement and worsening) using a patient-reported pain anchor. We hypothesized (1) that patients with RA in a clinical practice would have lower MID scores than those seen in randomized controlled trials (RCT); (2) that MID scores would be different bidirectionally (improvement and worsening); and (3) that MID scores would vary according to baseline HAQ and possibly disease duration. We speculated that RA patients with higher HAQ-DI disability could require greater change to constitute MID. We did not include the PASS in this study as we did not ask patients if they were in an acceptable state at the time of their visits.
MATERIALS AND METHODS
Multiple data are collected routinely on patients seen at St. Joseph’s Hospital Rheumatology Clinic, affiliated with the University of Western Ontario, which services a referral region of roughly 1 million people. The data are from patients seen by one rheumatologist (JP) serially who had been diagnosed with RA meeting the American College of Rheumatology criteria12 seen over a 6-month period (n = 347) and who had at least 2 consecutive visits within 12 months. One hundred twenty-two patients did not have complete data and were excluded from the analysis, leaving a final sample of 225 patients. Data were extracted from medical charts by a trained data-extraction person (DN) and entered into a database. Patients completed the HAQ-DI, pain VAS (0–100 mm), anchored from 0 (no pain at all) to 100 (very severe pain), and completed a 5-point Likert scale for the question, “How would you describe your overall status since the last visit?” on a scale labeled much worsened, somewhat worsened, the same, somewhat improved, much improved. Similarly, the patients were asked to complete a 5-point Likert scale for the question, “How would you describe your overall pain since the last visit?” on a scale labeled much worsened, somewhat worsened, the same, somewhat improved, much improved. Patients who reported somewhat improved or somewhat worsened were defined as the minimally changed subgroups. The changes in the HAQ-DI and pain VAS scores for the group that slightly improved and slightly worsened at their next followup visit within 1 year were determined in order to estimate the MID. This was compared to change scores for the group that reported the same, much improved, and much worsened.
The HAQ-DI change scores were calculated (HAQ from most recent visit minus HAQ from previous visit). We also collapsed categories in an exploratory analysis where HAQ-DI change scores corresponding to becoming better/much better or worse/much worse were compared to reporting no change in overall disease status.
To assess the usefulness of an anchor, change in the anchor and change in the health-related quality of life (HRQOL) scores should have a correlation coefficient of at least > 0.3713. The correlation coefficient of > 0.37 corresponds to an effect size of 0.80 (a large effect, as proposed by Cohen14). This was assessed using the Spearman correlation coefficient (as change in anchor is an ordinal variable) between the patient global and pain assessments (the anchors) and change in the HAQ-DI and pain scores. The MID was estimated by examining change in the HAQ-DI and pain VAS scores in subjects who were slightly improved and slightly worsened. These estimates were compared to those who improved or worsened more than slightly. Responsiveness to change was evaluated using the effect size (ES). ES is the mean change in the HAQ-DI or pain VAS from the baseline to followup divided by the standard deviation at baseline (0.76 for HAQ-DI and 28.8 for pain VAS). Cohen’s rule of thumb for interpreting effect size is that a value of 0.20–0.49 represents a small change, 0.50–0.79 a medium change, and ≥0.80 a large change, and in general an ES of 0.2 to 0.5 is usually considered relevant for MID14.
As an exploratory analysis, the effect of baseline HAQ-DI score and pain VAS score on the MID estimates was assessed. In other words, people with different baseline HAQ-DI or VAS pain scores may require different amounts of improvement or worsening to consider a change as representing an MID. We divided the baseline HAQ-DI and VAS pain scores into 2 groups — mild group (HAQ-DI < 1.0 and pain VAS ≤ 33.3) and the moderate to severe group (HAQ-DI ≥1.0 and pain VAS > 33.4).
RESULTS
The 225 patients with RA had long disease duration (11.7 yrs) and an average age of 60 years and 83% were women. The majority of patients (92%) were using disease-modifying antirheumatic drug (DMARD) therapy, 16% prednisone, 50% nonsteroidal antiinflammatory drugs (NSAID), and 17% were prescribed biologic therapies. The average baseline HAQ-DI was 0.97 ± 0.76 and at followup it was 1.0 ± 0.77, with a mean change of +0.03 ± 0.40 (Table 1).
Of the 225 patients, 53% reported no change at followup visit, 16% reported being somewhat improved, and 5% reported being much improved. In contrast, 22% reported being somewhat worsened and 4% reported being much worsened on the global assessment anchor. The Spearman correlation coefficient between the patient assessment of global change and the change in the HAQ-DI was 0.36 (p < 0.001).
The HAQ-DI change score for the group that somewhat improved and somewhat worsened was mean −0.09 (SD 0.42) and 0.15 (SD 0.33), respectively (Table 2). This corresponds to an ES of 0.12 and 0.20. This difference was also statistically significant compared to the no-change group [HAQ-DI mean 0.03 (SD 0.32), ES 0.04; Table 2]. The HAQ-DI change for somewhat better/much better was −0.20 ± 0.52 (ES 0.27) and +0.21 ± 0.33 (ES 0.27) for somewhat worse/much worse, respectively, whereas the mean change in HAQ-DI for “the same” was +0.03 ± 0.32.
Pain VAS results
Two hundred twelve patients completed the pain VAS at 2 timepoints. The baseline average pain VAS was 42.3 ± 28.8 and at followup 38.5 ± 27.9, with a mean change of −2.8 ± 25.9 (Table 3). Of the 212 patients, 52% reported no change, 14% reported being somewhat improved, and 5% reported being much improved. In contrast, 24% reported being somewhat worse and 5% reported being much worse. This distribution was similar to the HAQ-DI. The Spearman correlation coefficient between the patient assessment of pain change and change in the pain VAS was 0.37 (p < 0.001).
On average, patients who answered “somewhat better” and “somewhat worse” on the pain anchor changed by −11.9 and 6.8 mm on the VAS, corresponding to an ES of 0.41 and 0.23, respectively (Table 3). These estimates were larger than for the no-change group [−3.2 (SD 20.9); ES 0.11].
Table 4 provides the mean MID scores for improvement in the HAQ-DI scores stratified by severity of the mild (HAQ-DI < 1.0) compared to the moderate/severe (HAQ-DI ≥1.0) groups at baseline. Although they are exploratory, the data suggest that patients with moderate to severe baseline HAQ-DI scores require a larger improvement to be minimally improved, as exemplified by the HAQ-DI (MID for the minimally improved group for the low-score group = −0.03, and for the high-score group = −0.14). In contrast, the change required for the slightly worsened group with moderate to severe HAQ-DI (0.09) was smaller than that for the mild HAQ-DI group (0.24). A similar pattern was seen with the pain VAS. We did not have enough early RA disease duration data to compare HAQ-DI changes between early and late disease.
We also investigated the proportion of changes in patient-reported global states that were concordant with change in HAQ (Table 5). As might be expected, for larger changes such as much better (73% concordance) or much worse (100% concordance) there was more agreement. However, in better, the same, and worse, the HAQ was not as often concordant, as the usual HAQ-DI changes were small.
DISCUSSION
MID estimates provide a benchmark for future design of RA clinical and observational trials by helping researchers and clinicians understand whether HRQOL score differences between 2 treatment groups are meaningful, or if changes within one group over time are meaningful15. On average, our MID scores for improvement and worsening of HAQ-DI were −0.09 and 0.15, respectively, scores that are lower than those seen previously7,9,10. In RA studies the MID in HAQDI improvement has ranged from −0.22 to −0.247,9,10. Using the HAQ-DI > 1.0 at baseline, a representation of patients with moderate to severe disease in RA studies, the MID was 0.14 for improvement and 0.09 for worsening (as compared to 0.22 for improvement). After we combined the groups that improved “somewhat” and “a lot,” the average change in HAQ-DI was −0.20, a score similar to the MID HAQ-DI score used in other RA studies.
The difference in MID in clinical practice and RCT in RA has been alluded to by others16 and could be due to several reasons. First, the majority of patients in clinical trials have moderate to severe disease. Clinical trials usually recruit patients with active disease, and the patients are started on either active treatment or placebo to achieve better control of the disease. The improvement may be influenced in part by a statistical phenomenon, the regression to the mean, thus causing the need for a larger change in HAQ actual change (and placebo effect). In contrast, our group of patients was followed in a rheumatology practice, where DMARD and other agents were added as deemed necessary; the majority of patients were stable taking their current DMARD. Second, the expectations of patients in RCT, in addition to taking active versus control treatment, are different than in routine practice, which could give an added placebo effect. Third, the HAQ-DI is explained by both reversible and irreversible components (disease activity and damage) as recently demonstrated17. Thus, disease duration can affect the HAQ-DI, where early in the disease course the reversibility is greater, as there is little or no accrued damage, and RA is primarily manifested as inflammation with swollen joints; and later, radiographic damage and deformities explain more of the HAQ-DI. Therefore, some residual HAQ cannot be altered (due to damage). The long mean disease duration in our patients may explain the slight overall mean worsening of the HAQ-DI in the interval that was studied. However, MID for worsening were still less than those for improvement in the groups with baseline HAQ > 1.0 and pain VAS > 33.4; thus, these patients with greater disease activity had results more consistent with populations enrolled in trials11.
Our data on pain using a 100 mm VAS demonstrate that scores of −11.9 and 6.8 are indicative of MID for improvement and worsening using the pain anchor. This result is consistent with data from Wells, et al7, where a change of 10 mm on a 0–100 mm VAS was suggestive of the MID score. Wyrwich, et al have shown that MID anchored to patient global assessments changed by 0.5 SD18–20. In our study population identification of MID by the 0.5-SD rule works, with the exception of the “slightly improved” group in HAQ scores in both categories. We chose overall status as the anchor as it seemed most closely related to RA self-assessed disease. One would think the lack of specificity of the anchor (i.e., not an anchor of overall disability or function) would increase the MID due to other “noise” in self-rated overall status.
As previously noted13, the MID estimates may depend on the baseline scores. This trend was seen in our analysis (Table 4), where people with higher baseline scores (defined as baseline score of HAQ-DI ≥1.0 and pain VAS > 33.4) required a larger improvement in their HAQ-DI and pain VAS scores to be considered as minimally improved by the patients. Conversely, people with lower baseline scores (defined as baseline score of HAQ-DI < 1.0 and pain VAS ≤ 33.4) required a greater worsening in their HAQ-DI and pain VAS scores to be considered as minimally worsened by the patients. We are uncertain why this is the case. One would predict larger changes in both improved and worsened HAQ-DI and VAS pain at higher baseline values (and thus percentage changes would be similar). In general, at lower scores, RA patients may be more optimistic and require larger changes for worsening than improving.
Our study has some limitations. First, the patients were primarily Caucasian and had to be able to read English in order to complete their forms. Second, most patients had long disease duration (mean 11.7 yrs), so the results may not be generalizable to very early RA, as the HAQ-DI is related to both inflammation and damage. Third, the time reference for the HAQ-DI is “over the last week” and we asked for the Likert scale “since the last visit,” which was usually 6 to 8 months previously; the retrospective ratings are susceptible to recall biases15. In addition to forgetting, recall has been shown to be influenced by more salient events and by one’s current mood state15, and the long duration between visits may have been a limitation. Although patients often ignore time anchors on questions, delay could have increased recall bias, which could have resulted in increased random error and, therefore, increased or decreased MID. Thus, the MID is susceptible to recall bias, as are all repeated questionnaires in rheumatology. Fourth, the choice of anchors can influence the MID scores. We used anchors that had a moderate correlation with HRQOL measures, providing some confidence in our results. Fifth, although the patients were from the practice of a single rheumatologist, treatment for this study was not standardized, and future studies should confirm these results. We could also include the Patient Acceptable Symptom State in future studies. Finally, eliminating some patients with incomplete data in practice may have affected the results. For instance, the patients who did not complete the HAQ may not have been fully literate in English, which could bias the results.
One could question the methods we used to calculate MID, or if, looking at mean change of HAQ in each global health state (the same, better, worse, much better, and much worse), there should be a correction for the amount of change in HAQ that the “no change” group has. For instance, the HAQ-DI no-change group has a mean change of 0.03, so our better change of 0.09 could have a correction of 0.03 (absolute value) added, which would be 0.12 (still smaller than what has been reported in RCT), and similarly for worsening, 0.03 added to 0.15 would be 0.18. In the MID literature, it is important that an anchor is used that is relevant and that is correlated to the variable of interest, so our methodology, although novel, meets the MID criteria. We are investigating the MID in clinical practice further by asking patients if they think they are the same in the variable of interest or have changed a little or a lot.
Often over time HAQ scores worsen, but patients may say they are stable or even improved; however, over the time between one visit and the next, there can be changes in disease state that are important and may be better, or worse, despite many patients being stable. Also, almost half of the patients who change by a HAQ score of 0.25 will rate themselves as unchanged. It is important to note that our observations are averages and that individual patients have changes in HAQ that are incremental (e.g., 0, 0.125, 0.25, 0.375, etc.), so the MID in individuals are different than group averages.
In summary, the MID scores for HAQ-DI in a clinical practice were smaller than those seen in previous clinical trials. The MID scores for pain VAS were similar to those observed in another study. The MID scores are influenced by baseline HRQOL scores, and may be influenced by disease duration, and our results from this study do not comment on very early RA. In addition, the MID changes are different for worsening (usually needing a larger value) than for improving. We observed that the MID for deterioration were much less than for improvement in patients with more pain and impairment in physical function.
Footnotes
-
D. Norrie was supported by an unrestricted research grant from the Department of Medicine, the University of Western Ontario, London, Canada.
- Accepted for publication September 18, 2008.