Abstract
Objective. Patient global assessment (PGA) is commonly measured using a visual analog scale (VAS). The VAS asks patients to integrate many dimensions of rheumatoid arthritis (RA) activity, yet its scope is poorly defined and its endpoints are vague. We investigated whether a modified Rating Scale that used marker states and more defined endpoints would provide a more valid measure of PGA.
Methods. In our prospective longitudinal study, 164 patients with active RA rated their global arthritis activity using the VAS and Rating Scale before and after treatment. To compare construct validity, we correlated each score with 2 reference measures of RA activity, the 28-joint count Disease Activity Score (DAS28) and the physician global assessment, and examined how each measure was associated with different aspects of RA activity, including pain, functioning, and depressive symptoms, in multivariate regression analyses. We also examined sensitivity to change.
Results. Both measures were correlated with the DAS28 (r = 0.39 for VAS; r = 0.35 for Rating Scale) and physician global assessment (r = 0.41 for VAS; r = 0.26 for Rating Scale) at the baseline visit. Pain and depressive symptoms had the strongest association with the VAS, while functional limitations and depressive symptoms had the strongest association with the Rating Scale. Residual analysis showed no differences in heterogeneity of patients’ ratings. VAS was more sensitive to change than the Rating Scale (standardized response means of 0.55 and 0.45).
Conclusion. As measures of PGA, the VAS and Rating Scale had comparable construct validity, but differed in which aspects of arthritis activity influenced scores. VAS was more sensitive to change.
Patient global assessment (PGA) is an important patient-reported outcome in rheumatoid arthritis (RA). As a core set measure, PGA is commonly used in clinical trials, individually and in composite indices, to evaluate RA activity and response to treatment1–4.
PGA is usually measured as a self-reported rating of global arthritis status using a visual analog scale (VAS) with endpoints of “very poor” and “very well”1. Although this method of measuring PGA is widely used, researchers have noted some of its limitations5,6. It has been argued that because the endpoints are not clearly defined, assessments may vary considerably among patients. One patient’s perception of “very poor” may differ from that of another patient, a limitation of any self-reported measure. In addition, the loosely defined anchors may not convey the full range of RA activity to be considered. Patients are likely to rate themselves in reference to their personal experience and not relative to the overall range of possible arthritis activity. Also, they may consider only selected aspects of RA, such as pain or joint swelling, in their rating, and overlook other important aspects such as function, or consider irrelevant factors. The multidimensional yet loosely defined character of the VAS may undermine its validity and contribute to heterogeneity among patients’ ratings. These problems arise because the content validity (item relevance and comprehensiveness) of the PGA cannot be tested. Despite these concerns, alternatives to the VAS have rarely been examined.
We compared the construct validity, consistency, and sensitivity to change of the VAS and a modified Rating Scale as measures of arthritis activity. Rating scales are generic preference measures, sometimes represented as feeling thermometers, on which respondents rate their health on a scale from “perfect health” to “death” or “worst imaginable health.”7–9 Rating scales often use marker states, hypothetical scenarios describing mild to severe health states that may help in alerting patients to various aspects of health and the broad range of activity of the disease to consider when making their self-rating. Patients are usually asked to rate these marker states and then rate their own health. Although rating scales are typically used to measure quality of life, in our study we modified the Rating Scale to focus specifically on RA activity. We modified the prompts, anchors, and marker state descriptions to ask patients to rate how their arthritis affected their health. We hypothesized that the modified Rating Scale, with its more defined endpoints and marker states, would minimize ambiguity in PGA responses and would be a more valid and responsive measure of PGA than the VAS.
MATERIALS AND METHODS
Patients were participants in a prospective longitudinal study of changes in RA activity. They were recruited from the practices of the investigators or local rheumatologists, and from the community by advertisement. To be enrolled, patients were required to fulfill the revised American College of Rheumatology criteria for the classification of RA10, to be age 18 years or older, able to read English, and able to provide informed consent. In addition, they were required to have active RA, with at least 6 tender joints on examination and judgment of active synovitis by the study rheumatologist, and have plans for a change in their antirheumatic treatment (either escalation of doses of current medications, addition or change of disease-modifying medication or anti-tumor necrosis factor-α medication, or addition of prednisone). The study protocol was approved by the institutional review board, and all patients provided written informed consent.
Evaluation
At study entry, the patients completed a clinical evaluation that included a paper questionnaire, a computer-administered assessment with the modified Rating Scale, a musculoskeletal examination, and laboratory testing for erythrocyte sedimentation rate (ESR) and C-reactive protein. The questionnaire asked about the patients’ demographic characteristics, medical history, symptoms, function, and the VAS. A rheumatologist performed the musculoskeletal examination, which included tender and swollen joint counts of 68/66 joints. We used data from the baseline visit and a followup visit, which was 1 month later for those patients treated with the addition of prednisone, and 4 months later for those treated by a change in disease-modifying medications. Timing of the second visit differed because of an expected difference in the time course of the treatment effect between these classes of medications.
Study measures
The PGA was rated by patients using 2 different methods: the VAS and the modified Rating Scale. Patients completed the VAS before completing the Rating Scale. The VAS was a 15-cm horizontal line with marked anchors: 0 = very well and 100 = very poor. With these anchors, patients were asked to respond to the following: “Considering all the ways that rheumatoid arthritis affects you, rate how you are doing on the following scale by placing a mark on the line.”
The modified Rating Scale was a computer-administered vertical scale with a top anchor of 100 = “perfect health” and a bottom anchor of 0 = “worst imaginable health.” Horizontal and vertical scales were shown to be comparable in measuring PGA7. The direction of scoring was reversed to make it comparable to the VAS. Before marking the Rating Scale, the patients were first presented and asked to consider 3 case scenarios of mild, moderate, and severe RA activity. The scenarios, derived from the McMaster Utility Measurement Questionnaire11, described the ways in which one’s health, functional status, and social activities can be affected by RA.
The mild case scenario read as follows: Think what it would be like to live in the following way: You are able to perform all of your daily activities, like work, shopping, and driving. You are completely able to take care of your personal needs, like eating and bathing. You have some difficulty participating in leisure activities, like sports and hobbies. You have occasional pain. You normally do not have any worry or stress, but sometimes you are concerned about the future course of your arthritis. You have some mild stomach upset from some medication you take.
The moderate case scenario read as follows: Think what it would be like to live in the following way: On most days you are able to run errands and work around the house, but fatigue and joint pain prevent you from working. You are completely able to take care of your personal needs, such as eating and bathing. Joint pain is mild on most days, but is never gone and is sometimes quite severe. You rarely have enough energy for leisure activities. At times you are frustrated with dealing with your arthritis. The medication you take sometimes causes diarrhea.
The severe case scenario read as follows: Think what it would be like to live in the following way: You are unable to work, shop, or drive. You have much difficulty getting around outside the house. Sometimes you need help to bathe. You are unable to participate in any leisure activities. You are depressed and frustrated. You have severe pain on most days. The medications you take cause you painful sores in your mouth and difficulty thinking.
Participants were asked to rate each scenario using the Rating Scale under the prompt, “Move the marker to indicate where you would rate your health if this was what your arthritis was like.” The scenario ratings were used to provide patients with context for the range of severity of RA activity and the various health domains that RA can affect. Lastly, they were asked to rate their current health, with the prompt, “Think about how YOUR ARTHRITIS affects you CURRENTLY. Move the marker to indicate where you would rate your current health.” Only the patients’ rating of their current status was used in the analysis.
In addition to the VAS and the Rating Scale, we used these reference measures for RA activity: the 3-variable DAS28 (including the tender joint count, swollen joint count, and ESR but excluding the global rating) and the physician global assessment, rated on a VAS with anchors of 0 = “none” and 100 = “extremely active”2,12. The joint examination and physician global assessment were performed blinded to the patient-reported measures, so that they could be used as reference measures. The other patient-reported measures included the pain scale, measured using a 15-cm VAS with 0 representing “no pain” and 100 representing “severe pain”; severity of joint stiffness, measured using a 15-cm VAS with 0 representing “none” and 100 representing “severe”; and the 20-item Health Assessment Questionnaire (HAQ) Disability Index, on which respondents rated the difficulty in performing tasks in 8 functional areas13,14. Responses to each question ranged from 0 (no difficulty) to 3 (unable to do), and the highest scores in each functional area were averaged to compute the Disability Index (possible range 0–3). Another measure was the 20-item Center for Epidemiological Studies-Depression Scale (CESD), which asked about the frequency of depressive thoughts and feelings in the last week15. For analysis, we excluded 4 questions of the CESD that RA activity can influence16. The possible range was 0 to 48, with higher scores indicating more depressive symptoms. We included questions about age, sex, race, duration of RA, and education level.
Statistical analysis
We tested the construct validity of the VAS and Rating Scale by correlating each with the 2 reference measures (DAS28 and physician global assessment) at each visit using Spearman’s correlations.
We next used multivariate linear regression models to assess the association of VAS and Rating Scale with a broader set of measures that may influence self-report of RA activity. We examined the extent to which factors such as pain, depression, functional disability, and duration of RA influenced self-reporting of RA activity beyond the more direct association of reference measures. We tested separate models using either the VAS or Rating Scale as the dependent variable, and either the DAS28 or physician global assessment as the reference measure of RA activity. Additional covariates in each model were age, sex, education level, pain scale, severity of joint stiffness, modified CESD, and HAQ. This analysis was performed only on data from the baseline visit.
We compared the heterogeneity among patients’ ratings of the VAS and Rating Scale using residual analysis. A residual is the difference between the observed value of the dependent variable for each patient (in this case, VAS and Rating Scale), and its predicted value based on the multivariate model. We compared the absolute values of the residuals for each patient in models predicting VAS and Rating Scale using the paired t-test. When 2 models are based on the same patients and use the same independent variables, the model with smaller residuals indicates less heterogeneity among patients in how they rate PGA, relative to their values for the independent variables in the model.
Using the change from the baseline visit to the followup visit, we also examined sensitivity to change of both measures. We calculated the standardized response mean (SRM) as the mean change/SD of the change. The 95% CI of the SRM was based on 100 bootstrapped samples of 75 patients each. We used SAS version 9.1 (SAS Institute, Cary, NC, USA) for all statistical analyses.
RESULTS
Patient characteristics
Of 175 patients enrolled, we excluded 11 because of missing data on either the Rating Scale or VAS, leaving 164 patients for analysis. The cohort predominantly consisted of well educated white women with a mean age of 54.4 years (Table 1). Based on DAS28 scores, patients had moderate to high RA activity at the baseline visit, and their HAQ scores indicated a moderate level of functional limitation. Ninety-one percent of patients reported RA to be their most important health problem. The VAS and Rating Scale were highly correlated (r = 0.59, p < 0.0001). These measures also showed high agreement by intraclass correlation (r = 0.56; 95% CI 0.48–0.63).
Demographic and rating data of the patients (n = 164). Values in parentheses after the measures are the possible ranges. All variables at baseline visit have n = 164. All variables at followup visit have n = 156, except DAS28 (n = 149).
Correlations with reference measures
Both the VAS and the Rating Scale were moderately highly correlated with the DAS28 (Table 2). The VAS had a stronger association with physician global assessment than did the Rating Scale at the baseline visit. At the followup visit, the 2 PGA measures had similar associations with both reference measures. Correlations with both reference measures were somewhat higher at the followup visit, when RA activity was lower.
Correlations of the visual analog scale and Rating Scale with reference measures of RA activity.
Prediction analysis
We used multivariate regression analyses to examine the association of the VAS and Rating Scale with clinical measures in addition to the DAS28 (Table 3). In these models, depressive symptoms, based on the modified CESD, were strongly associated with both the VAS and the Rating Scale. The pain scale was significantly associated with the VAS but only marginally associated with the Rating Scale. Stiffness was associated only with the VAS. In contrast, the HAQ was very strongly associated with the Rating Scale but had no significant association with the VAS. After adjustment for these other measures of RA activity, the DAS28 was no longer associated with either the VAS or the Rating Scale.
Association of demographic characteristics and clinical measures with patient global assessment by the visual analog scale or the Rating Scale, by multivariate regression.
Results were similar in additional models that used the physician global assessment instead of the DAS28 as the reference measure of RA activity (Table 3). The only notable difference was the stronger association of pain with the Rating Scale in this model compared to the DAS28 model. There was no evidence for multicollinearity in either model.
We also examined the interindividual variability of patients’ ratings of the VAS and the Rating Scale using the residuals of the multivariate models. In models that used the DAS28 as the reference measure, the mean absolute value of the residuals in the models predicting VAS was 12.5, and the mean absolute value of the residuals in the models predicting the Rating Scale was 14.1. The close similarity of these values indicates a similar degree of heterogeneity among patients in their rating of PGA using each measure. A formal comparison using paired t-test demonstrated no significant difference between residuals of models predicting the VAS and the Rating Scale (p = 0.19). Similar results were found for residuals in models that used the physician global assessment as the reference measure (mean absolute value of the residuals for VAS and Rating Scale were 12.7 and 14.0, respectively; p = 0.28).
Sensitivity to change
At the followup visit, the VAS decreased on average (± SD) by 13.7 ± 24.2, and the modified Rating Scale decreased on average by 9.8 ± 21.8. The SRM for VAS was 0.55 (95% CI 0.53–0.57) and the SRM for the Rating Scale was 0.45 (95% CI 0.43–0.47; p < 0.0001). Hence the VAS was more sensitive to change.
DISCUSSION
We compared the validity of the VAS and a modified Rating Scale as measures of PGA. The Rating Scale included marker states and more clearly defined anchors in an effort to help guide patients’ judgments of the domains to consider when making their ratings, which could potentially improve the validity of the PGA and reduce heterogeneity in scoring among patients. However, we found no difference in validity between the VAS and the Rating Scale. Both measures were similarly associated with the reference measure of DAS28 at baseline and followup, and with the physician global assessment at followup. In addition, the Rating Scale and VAS demonstrated similar heterogeneity when residual analysis was used to evaluate the consistency of ratings among patients. Also, the Rating Scale and VAS were similarly influenced by mood. The marker states likely prompted patients to consider functional limitations in their scoring of PGA using the Rating Scale, while this aspect of RA activity was not a factor in PGA rated by the VAS.
The literature investigating whether the use of marker states improves the construct validity and responsiveness of patient global ratings has shown mixed results. Schunemann and colleagues assessed the validity of the feeling thermometer with and without marker states in several studies. In a study of 86 patients with chronic respiratory disease, the use of marker states was not associated with statistically significant differences in the validity or responsiveness of ratings on a feeling thermometer17. A second, larger study of patients with chronic respiratory disease and a study of patients with gastroesophageal reflux showed that the use of marker states modestly improved the responsiveness of the feeling thermometer, as well as the strength of correlations with a number of disease-specific health measures18,19. Bremner and colleagues found small improvements in the validity of the rating scale when marker states were used in a study of patients with prostate cancer20.
Considering that PGA is an integral measure in RA clinical research, few studies have examined which factors influence patients’ global rating of their arthritis activity. We previously reported that pain and functional limitations had similar importance in patients’ ratings, but the association of these factors with PGA varied among subgroups based on age, sex, education, and ethnicity21. Fries and Ramey showed that pain and functioning were strong predictors of PGA as assessed by both the VAS and the feeling thermometer7. Our results differ, in that functioning was associated with the Rating Scale (analogous to the feeling thermometer) and not the VAS. This disparity might be due to differences in the demographic characteristics of the patient samples. Patients in Fries and Ramey’s study were older and had a longer duration of RA. Among patients with more advanced RA, functional limitations may have a more prominent influence on how patients rate the PGA than in a younger sample. Fries and Ramey’s study also did not include depression as a potential predictor of PGA. The importance of depressive symptoms was demonstrated by Smedstad and colleagues, who showed that the association of functional limitations with PGA was no longer significant after accounting for the influence of depression22. Similar to our results, Smedstad, et al found that pain and depressive symptoms were the predominant predictors of PGA.
In addition to examining the validity of the VAS and Rating Scale, we explored the consistency of ratings of these measures. The use of more clearly defined anchors and the marker states in the Rating Scale might have served to reduce the heterogeneity of responses among patients. We tested this through residual analysis and found that heterogeneity was not lower in the Rating Scale than the VAS. This result suggests either that the cues (anchors and marker states) were not strong enough or realistic enough to guide patients’ ratings, or that the ratings were influenced by factors not included in the models.
We found that the VAS was more sensitive to change than the Rating Scale. Sensitivity to change depends on the balance between signal and noise. The marker states and more clearly defined anchors of the Rating Scale may have confined the range of responses, decreasing the signal relatively more than the noise, and blunting somewhat the sensitivity to change of the Rating Scale. Without the cues and guides provided in the Rating Scale, patients might have felt more freedom to record larger changes on the VAS. These changes in the signal were not outweighed by greater heterogeneity in responses among patients (e.g., noise), resulting in higher SRM for the VAS than the Rating Scale.
The strengths of our study include the large sample of patients with active RA examined at 2 visits, comprehensive assessment of multiple measures, and analysis of their influence on the measures being studied. In addition, we used residual analysis as a novel approach to assess the degree of heterogeneity in patient responses. A single examiner performed the joint counts and the physician global rating for all patients, a factor that provided for more reliable evaluations. The format, direction of scoring, and method of administration of the VAS and Rating Scale were different, helping to ensure that patients did not simply transfer ratings from one scale to the other. However, the presence of multiple differences between the 2 measures did not allow us to identify which component contributed most to differences in the performance of the measures. Another limitation is that we used marker states only with the Rating Scale and not with the VAS. A different study design would have been needed to specifically test the effect of marker states alone. The marker states we used were modified from the McMaster Utility Measurement Questionnaire, which, while not specific to RA, included many relevant domains of health11.
Although the Rating Scale with marker states was designed to be a more descriptive instrument than the VAS, both instruments had comparable validity and consistency among patients in our study. These results provide reassurance that the VAS can accurately identify patients’ assessments of their RA activity. Given that the VAS is easier to administer than the Rating Scale, and is more sensitive to change, our study supports its use as the more preferable method to measure PGA in patients with RA.
Footnotes
-
Supported by the Intramural Research Program of the NIAMS/NIH.
- Accepted for publication November 24, 2009.