Abstract
Objective. Patient assessments of disease activity (PtGA) and general health (GH) measured by visual analog scale (VAS) are widely used in rheumatoid arthritis (RA) clinical practice and research. These require comprehension of the question’s wording and translation of disease activity onto a written VAS, which is problematic for patients with limited health literacy (HL) or difficulty completing forms. This study’s objective was to validate verbally administered versions of patient assessments and identify factors that might explain discrepancies between verbal and written measures.
Methods. We enrolled patients with RA at the Denver Health rheumatology clinic (n = 300). Subjects were randomized to complete the traditional written PtGA and GH and one of the verbal assessments. Subjects provided a verbal numeric response after reading the question, having the question read to them in person, or hearing the question over the phone. Spearman and Lin correlations comparing written and verbal assessments were determined. Multivariate logistic regression was performed to explain any discrepancies.
Results. The instruments administered verbally in-person showed good, but not excellent, correlation with traditional written VAS forms (Spearman coefficients 0.59 to 0.70; p < 0.001 for all correlations). Twenty-three percent of subjects were unable to complete 1 of the written VAS assessments without assistance. HL predicted missing written data and discrepancies between verbal and written assessments (p < 0.05 for all correlations).
Conclusion. Providers should use verbal versions of PtGA and GH with caution while caring for patients unable to complete traditional written version. Limited HL is widely prevalent and a barrier to obtaining patient-oriented data.
Patient global assessments of disease activity (PtGA) in rheumatoid arthritis (RA) and general health (GH) as assessed by self-administered visual analog scales (VAS) are widely used in RA clinical practice and research trials1. These patient-reported outcomes have assumed critical importance in recent years owing to the prevailing movement to guide RA therapy based on disease activity (or “treat-to-target”) as measured by composite indices, all of which include either the GH-VAS or the PtGA-VAS2. The profile of these measures has increased even further with remission emerging as a realistic goal for RA treatment and the recent publication of remission criteria that mandate low PtGA-VAS or GH-VAS levels3.
The PtGA-VAS and GH-VAS have received additional scrutiny because of evidence showing discrepancies with evaluator global assessment of disease activity (EGA) and other disease activity indices4,5,6. Recent research also demonstrates that these patient assessments alone may prevent patients from being classified as in remission7,8. This emphasis on the PtGA-VAS and GH-VAS is likely to continue with the potential adoption of patient-based remission criteria9. It will also intensify if RA medication efficacy trials increasingly align with the treat-to-target model, rather than the current use of the American College of Rheumatology (ACR) 20, 50, and 70 criteria, which allow for an improvement without a mandated low level of the PtGA-VAS or GH-VAS10,11.
Despite this recent emphasis on the PtGA-VAS and GH-VAS, completion of PtGA in the VAS format might be problematic for many patients. This includes patients with poor vision, limited literacy, limited numeracy, patients unable to write because of hand deformity, neurologic disorders or muscular diseases, and patients unable to keep in-person appointments. These conditions and circumstances are common: one-third of patients in the United States have basic or below basic health literacy5,12 and about 3% of elderly adults have functional visual impairment13. There is no published literature regarding how assessments of PtGA and GH are gathered in these patients who are unable to complete the standard written form, but in all likelihood, the information is gathered verbally.
Some existing literature has established the correlation of verbal rating scales (VRS) for pain (rather than PtGA and GH) with VAS measures in patients with inflammatory arthritis14. These verbal measures perform suboptimally in patients with less education or limited literacy15,16. Interestingly, patients with less educational attainment often prefer the Pain-VRS and find it easier to understand than the traditional written Pain-VAS17. One pilot study gathered verbally administered GH in patients with RA18, but there are no published data regarding the validity of verbally attained PtGA or GH.
To investigate the validity of verbally administered PtGA and GH in English-speaking and Spanish-speaking patients with RA, we performed a cross-sectional observational study at an academic, safety-net rheumatology clinic. We hypothesized that verbally obtained responses would strongly correlate with the traditional VAS versions. In addition, we anticipated that sociodemographic factors such as education may be associated with discrepancies between verbally obtained and traditional written patient VAS assessments. We also hypothesized that sociodemographic factors may be associated with missing and noninterpretable written VAS patient assessments. Finally, to explore the potential use of verbally obtained information by telephone in phase IV clinical studies and between-visit care, we examined various methods of administering a verbally obtained patient assessment (in person, by telephone) to determine whether these methods affected the correlation of verbal and written patient assessments.
MATERIALS AND METHODS
Patients who met the ACR 198719 criteria for RA were eligible for recruitment. We excluded subjects less than 18 years of age, prisoners, persons with uncontrolled psychiatric illness, and patients with vision worse than 20/100, as measured by a Snellen eye chart. Patients who received an intramuscular or intraarticular corticosteroid injection during their visit were also excluded, because the injections could potentially confound the comparisons of self-assessments performed at the time of the study visit. Subjects received $10 to participate in the study. The subjects for this study were recruited on the day of a regularly scheduled patient visit from the Denver Health rheumatology clinic between November of 2011 and January of 2013. Denver Health and Hospital Authority is an urban safety-net system that serves 150,000 patients annually, of which 78% are minorities and 50% are uninsured20. A bilingual research assistant identified eligible subjects by reviewing the daily clinic schedule and medical charts for patients who self-identified either English or Spanish as their primary language. We recruited subjects at the end of their rheumatology clinic visits by first providing patients with a verbal description of the study prior to the use of any written materials. The recruitment process was conducted in a confidential workroom with sensitivity and respect to ensure the dignity of persons with limited literacy21 and to mitigate against response bias. Subjects were randomly assigned to complete written assessments at either the beginning or end of the study visit. Subjects completed all written assessments in their primary language without assistance from the research assistant. The complete study packet was 11 pages. The written VAS assessments were placed either first or last in the study packet to minimize the risk of them being skipped. Figure 1 is a flowchart of the study visit.
Participant study visit flow. HL: healthy literacy; PtGA: patient global assessment from the Disease Activity Score with 28-joint count; VAS: visual analog scale; GH: global health assessment from the Multidimensional Assessment Questionnaire; VO: verbally obtained; VO-PtGA1/VO-GH1: patient reads the global assessment questions and provides verbal numeric responses to the research assistant in person; VO-PtGA2/VO-GH2: research assistant reads the global assessment questions to the patient and the patient provides verbal numeric responses to the research assistant in person; VO-PtGA3/VO-GH3: research assistant reads the global assessment questions to the patient and the patient provides verbal numeric responses by telephone in a quiet room; VO-PtGA4/VO-GH4: research assistant reads the global assessment questions to the patient and the patient provides verbal numeric responses by telephone after the visit.
We used the following definitions, measures, and assessments.
Written patient global assessments
Subjects were asked to complete written assessments based on terminology for the PtGA from the Disease Activity Score 28-joint count (DAS28)22 and GH from the Multidimensional Health Assessment Questionnaire (MDHAQ)23.
For the PtGA-VAS, subjects provided a written global assessment of their disease severity by recording their responses as marks on a horizontal visual analog scale, 100 mm in length, in response to the question, “Please mark an ‘X’ on the line below to show how active has your rheumatoid arthritis been during the past seven days.” The left anchor was “no disease activity” and the right anchor was “high disease activity”22.
For the GH-VAS, subjects provided a written global health assessment on a horizontal VAS 100 mm in length based on the standard terminology used in MDHAQ: “Considering all the ways in which illness and health conditions may affect you at this time, please make an ‘X’ on the line below to show how you are doing.” The left anchor was “very well” and the right anchor was “very poorly”23.
Verbal patient global assessments
Subjects were randomly assigned to complete a verbal numeric response of between 0 and 100 for both versions of the patient assessments at either the beginning or end of the study visit. Subjects were randomized to provide 1 of 4 different forms of the verbally obtained (VO) patient assessments.
VO-PtGA1 and VO-GH1 were obtained by the patient reading the global assessments in their primary language and providing a verbal numeric response to the research assistant in person.
VO-PtGA2 and VO-GH2 were obtained by the research assistant reading the global assessments to the patient in their primary language and the patient providing a verbal numeric response to the research assistant in person.
VO-PtGA3 and VO-GH3 were obtained by the research assistant reading the global assessments to the patient in their primary language and the patient providing a verbal numeric response by telephone in a research visit room at the start of the study visit.
VO-PtGA4 and VO-GH4 were obtained by the research assistant reading the global assessments to the patient in their primary language and the patient providing a verbal numeric response by telephone within 3 h after their clinic visit on their mobile or home telephones.
Twenty-eight-joint tender joint count (28TJC) and swollen joint count (28SJC), evaluator global assessment of disease activity (EGA)
The provider who performed a history and physical examination during the clinical visit recorded the results of a 28TJC and 28SJC and EGA using a horizontal 100 mm VAS. A score of 100 signified “very poorly” and a score of 0 signified “very well”23. The provider was blinded to all patient global assessments.
Demographic information
We obtained by written self-report from the subjects the following: patient’s primary language, age, sex, ethnicity, race, marital status, highest level of education completed, tobacco use history, annual household income, and employment status.
Tests of functional health literacy (HL)
We used the modified Short Test of Functional Health Literacy in Adults (S-TOFHLA), a validated English and Spanish language instrument to assess functional HL24. The large-print version of the modified S-TOFHLA is a 12-min, 14-point font cloze procedure that contains 2 reading comprehension sections about medical subjects and 4 questions assessing numeracy24. A cloze procedure is an evaluation in which words are deleted from written passages regarding medical subject matter, and respondents attempt to select the correct missing words. Inadequate HL was defined as a total score of < or equal to 53. Marginal HL was defined as scores between 54–66.
Laboratory studies
We performed chart review to determine results of prior anticyclic citrullinated peptide antibody (anti-CCP) tests. Erythrocyte sedimentation rate (ESR, mm/h) and C-reactive protein (mg/dl) levels were obtained as part of clinical care following the completion of the study paperwork. All laboratory analyses were performed at the Denver Health Medical Center clinical laboratory.
Prior patient experience
We performed chart review to determine whether subjects had completed the PtGA-VAS previously, and their history of biologic medication use and current prednisone use/dose.
MDHAQ-pain
The subjects circled a number between 0 and 10 along a horizontal scale in response to the question “How much pain have you had because of your condition over the past week?”23.
MDHAQ-fatigue
The subjects provided an assessment of fatigue by circling a number between 0 and 10 using a horizontal scale in response to the question “How much of a problem has unusual fatigue or tiredness been for you over the past week?”23.
Depression
Subjects completed the 2-question patient health questionnaire (PHQ-2) depression screen25.
Outcome measures
The correlation coefficient of the VO-PtGA and VO-GH with PtGA-VAS and GH-VAS were the primary outcome measures. Written assessments were classified as noninterpretable if the subject attempted to complete the PtGA or GH but provided a response that could not be scored. Examples of such responses include writing along the VAS line or circling an anchor. Written assessments were classified as missing if the PtGA or GH were left blank by the subject on the form after (1) the research assistant explained the need to complete the forms, (2) the subject was given as much time as required to complete the form, and (3) the subject returned the form to the research assistant. Verbal responses were considered missing if the patient failed to return telephone calls.
Statistical methods
Prior to the study, we determined that a sample size of 100 (representing each examined VO and written VAS pair) would achieve 87% power to detect a difference in correlation of 0.3. To determine whether we could aggregate verbal self-assessment scores determined by telephone in a quiet room and scores determined by a phone after the visit, we used a t-test to compare mean scores. To determine whether the order of written assessments was associated with missing data, we performed the chi-square test and compared the presence of missing data between subjects who completed the written assessments at the end of the protocol to those who did so at the onset of the study.
We addressed missing data by imputing missing data based on a linear regression that included all clinical and demographic characteristics from Table 1. Correlation of the VO-PtGA and VO-GH were determined by Spearman coefficients, Bland-Altman 95% limits of agreement, Bland-Altman plots, and Lin’s Concordance correlation coefficients. Comparisons were made to written VAS assessments and the EGA. We modeled all global assessments as continuous variables. HL, estimated by the S-TOFHLA, was also modeled as a continuous score, as has been done previously26. We defined disagreement/discrepancy between verbal and written assessment as those with a difference in scores of at least 2 SD of the written versions or EGA. We performed an initial univariate logistic regression to evaluate which variables predicted disagreement. We then performed a multivariate logistic regression including all terms with a p-value of 0.5. In the final model, p-values < 0.05 were considered significant. Regression analyses to determine predictors of missing or noninterpretable patient assessments followed identical procedures. All analyses were performed using Stata software version 12 (StataCorp).
Demographics of study cohort.
The Colorado Multiple Institutional Review Board approved our study.
RESULTS
Three hundred thirty-seven patients were screened for the study, which represents about 90% of the eligible subjects in our clinic. Fourteen persons declined to participate in the study for unspecified reasons. Fifteen patients were excluded because of complete functional illiteracy. Six patients were excluded owing to uncontrolled psychiatric disease. Two patients were excluded because of severe visual, hearing, or speech impairments, producing a study cohort of 300 subjects.
The subjects were predominantly female, unmarried, English-speaking, and currently unemployed. Sixty-three percent had completed the PtGA-VAS and none had completed the GH-VAS previously as part of their clinical care at Denver Health. This cohort represents a diverse population with about half being nonwhite and 58% Hispanic. Only a third had attended college. Eighty-two percent were anti-CCP-positive and the mean DAS28 was 4.0. Twenty-eight percent of patients had either inadequate or marginal functional HL. Table 1 summarizes the demographic characteristics of the study population.
Results of the criterion validity testing for the verbally obtained global assessments appear in Table 2. Any scoreable response was included in the criterion validity analyses. Sixty-nine subjects (23% of the cohort) had at least 1 missing or noninterpretable VAS response that was imputed. Such responses were received on 25 of both the GH-VAS and PtGA-VAS. Eleven of 50 patients randomized to perform the VO-PtGA4 and VO-GH4 at the conclusion of the study were unreachable by telephone. Patients were 22% less likely to respond to calls after the visit, and there was a statistical trend toward a difference in scores (p = 0.07) between VO-PtGA3 (self-assessment by telephone in a quiet room) and VO-PtGA4 (response by telephone after visit). Further, a visual inspection of score distributions for VO-PtGA3 and VO-PtGA4 suggested that the nonresponders likely constituted those subjects who would have provided high self-assessment scores. This finding was corroborated by imputation of the missing scores, which predicted high self-assessment scores for subjects who failed to respond to calls after the visit, whether the patient assessment adopted the DAS or MDHAQ formatted questions. For these reasons, we did not combine VO-PtGA3 and VO-PtGA4 scores in the analysis.
Correlations of various verbally administered patient assessments, standard written assessments, and evaluator global assessments.
Results of additional assessments using Lin’s concordance correlation, Bland-Altman 95% Limits of Agreement, and Bland-Altman plots (supplementary material available from the author on request) were consistent with the findings of Table 2. The correlation of the in-person verbally administered patient assessments with the traditional written/VAS patient assessments ranged from Spearman coefficients of 0.59 to 0.70, which is generally considered moderate to good correlation. The results were nonsignificant (p > 0.05) for verbal assessments obtained by phone at the end of the study visit.
About one-third of patients provided VO-PtGA responses that were discrepant by more than 2 SD with their PtGA-VAS. Similar findings were noted for the VO-GH responses (Figure 2). These discrepancies affected the classification of subjects’ disease activity: 17% of subjects’ disease activity classification changed if a verbally obtained PtGA replaced the PtGA-VAS in the DAS28-ESR calculation (Supplemental Table 2 available from the author on request).
Distributions of the absolute differences in traditional written patient assessments and verbally administered patient assessments for the study cohort. GH: global health assessment; MDHAQ: Multidimensional Health Assessment Questionnaire; PtGA: patient global assessment; DAS28: Disease Activity Score with 28-joint count.
With the exception of the VO-PtGA4 and VO-GH4, correlations of the verbally obtained globally assessments with the EGA demonstrated coefficients that ranged from 0.38 to 0.54, which is considered fair to moderate correlation. Similar results were noted, however, between the written patient global assessments and EGA. The correlations all demonstrated statistically significant relationships (p < 0.001 for all correlations).
We performed univariate and multivariate logistic regression analysis to determine which clinical and demographic features predicted a greater than 2 SD difference between the verbally administered patient assessments and traditional VAS assessments (Table 3; univariate results available from the author on request). In multivariate analysis, discrepancy between the VO-PtGA and PtGA-VAS was only consistently predicted by the S-TOFHLA comprehension score (greater comprehension decreased the odds of discrepancy). The S-TOFHLA numeracy score was the sole statistically significant predictor of discrepancy between the VO-GH and GH-VAS.
Final multivariate logistic regression models predicting odds of a discrepancy between verbally administered and written patient assessment and odds of a missing patient assessment score.
We also performed univariate and multivariate logistic regression analysis to determine the odds of missing written PtGA or GH values and noninterpretable VAS responses (univariate results are available from the author on request). In multivariate regression, missing and noninterpretable PtGA data was only predicted by the S-TOFHLA comprehension score and younger age (Table 3). Numeracy as defined by the S-TOFHLA numeracy score was the only variable that predicted missing and noninterpretable GH data (greater numeracy associated with fewer missing values).
DISCUSSION
We were able to demonstrate a moderate correlation between in person verbally obtained PtGA and GH and traditionally obtained written VAS versions of these measures. This correlation was not as strong as we had hypothesized. We were also able to confirm that sociodemographic factors are associated with the presence of a discrepancy between verbally and traditionally obtained patient assessments. Specifically, we demonstrated that limited HL predicts the presence of a clinically relevant discrepancy. An additional important finding of our study was that a large number of subjects with limited HL did not complete the traditional written VAS versions of the PtGA and GH without assistance. The relationship between limited HL and missing VAS data suggests that some patients are facing significant challenges completing this instrument. Given the association of limited HL and missing data, our estimates for the association of HL with discrepant assessments likely represents an underestimate for the true effect size (compared with complete data).
The correlation coefficients were similar between written VAS forms and verbally obtained assessments, regardless of whether the patients read the question themselves or the research assistant read the question to the patient in person. This finding supports the future use of in-person verbally administered patient assessments for population-based studies and randomized controlled trials when subjects are unable to complete the written forms. The modest correlation of verbally administered patient assessments with traditional written assessment, however, suggests that clinical decision making for the individual patient should not be made based on verbal patient assessments alone. In instances in which providers are unable to use the traditional written instruments, additional weight should be given to other indicators (laboratory tests, etc.).
Our results expand the science of patient-reported outcomes in 3 key areas. First, our study is the first, to our knowledge, to investigate the validity of verbally administered versions of the PtGA and GH. Second, we have enriched the understanding of the literacy burdens of the instruments used to collect patient-reported outcomes. And third, we established the relationship between limited HL and incomplete or inaccurately completed patient assessments in subjects with RA. We are unaware of previous research documenting how frequently patients need assistance with the GH-VAS or PtGA-VAS. Our finding that more than 1 in 5 patients in our clinic provided blank or “un-score-able” VAS responses raises concern that critical patient report outcomes are functioning as a hidden barrier to assessing disease activity in research trials and clinical care. The link between limited HL and incomplete or inaccurately completed PtGA-VAS or GH-VAS responses has not been previously reported by other groups. We are performing ongoing research to determine the role of patient and system factors that are hindering proper completion of the GH-VAS and PtGA-VAS.
Previous research has investigated techniques to facilitate the capture and measurement of the PtGA and GH. Pincus, et al studied modifications to the GH-VAS and other VAS metrics including the use of 21 circles at 5 mm intervals26. Studies have also attempted to capture the GH or PtGA electronically with Web-based portals, computer touch screens, and personal digital assistant-based formats; these investigations, in general, demonstrate modest associations with traditional VAS techniques27,28,29. Verbal rating scales of pain have been studied in patients with inflammatory arthritis15, but our study is the first, to our knowledge, to systematically assess the criterion validity of verbally obtained PtGA and GH. This is important because these measures are obtained frequently in research trials and increasingly in clinical practice, given the numerous conditions and circumstances in which the traditional written assessments are not available.
This study has several limitations. Its cross-sectional design does not permit us to determine whether limited HL was causally associated with the missing data or discrepancy between verbal and written assessments. It is possible that limited HL is a proxy for a factor inadequately controlled for in our regression analysis, such as education and income level, which we only captured as categorical variables. We did, however, examine a large number of variables that may influence HL, including race, ethnicity, age, employment, income, and language. The relatively low number of Spanish-speaking patients may have limited our power to detect the relationship of primary language with our various outcomes. We also did not perform 2 written GH-VAS and PtGA-VAS on each subject, so we are unable to ascertain the baseline test-retest variability of these instruments. Because we did not want to influence patient responses through subtle prompts, we deliberately gave no guidance to patients with regard to completion of the instruments, apart from the actual assessment question itself. This contributed to missing GH-VAS and PtGA-VAS data in about 20% of our patients. Imputation of missing data, however, had no effect on our findings.
It is possible that minor assistance may have improved the performance of the verbal assessments and reduced the rates of missing data; however, this approach would also have the potential to bias our results. Nearly two-thirds of our subjects had completed PtGA-VAS previously, but none had completed a GH-VAS at Denver Health. This was not a likely source of bias, because similar discrepancies were seen with the written and verbal forms of both instruments, and prior completion of a PtGA-VAS did not predict discrepant scores. Subjects who completed the written VAS first were less likely to have missing values for their self-assessments; however, the relationships between HL scores on the S-TOFHLA and odds of a missing PtGA or GH (Table 3) remained after stratifying the analysis by the order in which the written self-assessment was administered.
With regards to strengths, our study included a broad range of patients in terms of sex, age, race, ethnicity, and literacy skills, typical of an urban public health hospital in the United States. The results likely have external validity when applied to other diverse populations, but may not apply to more homogenous subpopulations.
Our findings are relevant to both rheumatology practice and research. Providers and researchers should be cognizant of limited HL and appreciate that some of our patient-reported outcomes, although laudable in their attempt to honor the patient’s perspective, may not truly be reflective of the patient’s disease state. Additionally, accurate completion of the PtGA-VAS and GH-VAS is challenging for patients with limited HL and introduces additional complexity to the delivery of care to these patients30. Our group is engaged in efforts to understand these instruments better from the patient perspective, with a focus on which domains (anchors, wording of question, scale, etc.) engender the most confusion. National quality measures for the care of patients with RA need to reflect the challenges that patients with impaired HL face in attempting to complete patient-reported measures. There has been increased awareness regarding the myriad of differently worded versions of the PtGA31, but greater focus and standardization needs to be applied to the instructions and explanations given to the patients when they require assistance completing these instruments. Additional research is needed on how best to capture the PtGA and GH in patients with low HL or inability to complete written testing for other reasons.
Footnotes
-
Supported by a UCB pharmaceuticals Investigator Initiated Studies grant to Denver Health. Dr. Caplan is supported by VA Health Services Research and Development Career Development Award 07-221. Dr. Davis is supported by the Rheumatology Research Foundation Scientist Development Award. The views expressed do not represent the views of University of Colorado SOM, the Department of Veterans Affairs, or the Denver Health and Hospital Authority.
- Accepted for publication October 25, 2013.