Abstract
Objective. Patient self-report outcomes and physician-performed joint counts are important measures of disease activity and treatment response. This metaanalysis examines the degree of concordance in joint counts between trained assessors and patients with rheumatoid arthritis (RA).
Methods. Studies eligible for inclusion met the following criteria: English language; compared patient with trained assessor joint counts; peer-reviewed; and RA diagnosis determined by board-certified or board-eligible specialist or met 1987 American College of Rheumatology criteria. We searched PubMed and Embase to identify articles between 1966 and January 1, 2008. We compared measures of correlation between patients and assessors for either tender/painful or swollen joint counts. We used metaanalysis methods to calculate summary correlation estimates.
Results. We retrieved 462 articles and 18 were included. Self-report joint counts were obtained by a text and/or mannequin (picture) format. The summary estimates for the Pearson correlation coefficients for tender joint counts were 0.61 (0.47 lower, 0.75 upper) and for swollen joint counts 0.44 (0.15, 0.73). Summary results for the Spearman correlation coefficients were 0.60 (0.30, 0.90) for tender joint counts and 0.54 (0.35, 0.73) for swollen joint counts.
Conclusion. A self-report tender joint count has moderate to marked correlation with those performed by a trained assessor. In contrast, swollen joint counts demonstrate lower levels of correlation. Future research should explore whether integrating self-report tender joint counts into routine care can improve efficiency and quality of care, while directly involving patients in assessment of RA disease activity.
Self-reported outcomes are important measures of disease activity and treatment response in clinical trials and practice in rheumatoid arthritis (RA). Systematic assessment of swelling and tenderness in joints, or joint counts, by physicians has been cited as the most specific measure of disease activity in RA and has been shown to be predictive of mortality1. Joint counts are included in standard composite measures of disease activity routinely used in clinical trials [Disease Activity Score-28 (DAS28) and the American College of Rheumatology (ACR) Core Data Set]. Formal joint counts, however, are inconsistently performed in clinical practice2.
Since the early 1990s, multiple studies have assessed the reliability, validity, and sensitivity to change of self-report joint counts. There are many potential advantages of a self-report joint count. Involving patients in disease activity assessment may enhance self-management behavior, and ultimately improve health outcomes. Self-management programs in arthritis have been shown to improve health status, reduce pain and fatigue, and increase self-efficacy3,4. A self-report joint count in RA, if shown to be reliable, could serve as an important marker of disease activity over the course of RA, since, in the collaborative model of chronic care, it is ultimately the patient, rather than any one clinician, who is responsible for continuing self-care and decision-making. Active engagement with one’s chronic disease has been shown to be associated with health improvements5⇓–7. Last, as the opportunities for Web-based self-management support and clinical decision-making across chronic diseases increase, there is an urgent need to explore reliable means of eliciting symptoms from patients with RA so that advances in disease management that are being applied in diseases such as congestive heart failure, diabetes, and asthma can also be adapted to RA.
Self-report in RA
Self-reported outcomes in RA have become central to the measurement of response to treatment both in clinical trials and in routine practice. Over 10 countries have published guidelines or consensus statements on the use of biologic therapies in RA that include a composite score to measure disease activity, such as the DAS288. New ACR recommendations for use of biologics in the US also promote the use of a disease activity score, many of which include a formal joint count9. Choy, et al have developed and validated a patient-based disease activity score (PDAS) for RA that incorporates a self-report tender and swollen joint count10. However, this tool has yet to be widely adopted or tested in clinical practice.
Can patients reliably report on joint tenderness and inflammation? While some studies suggest that self-report tender joint counts are reliable, others have not, and no systematic review of the research has been performed. Because of the discrepancy between physician and patient joint counts observed in some studies, and the potential positive influence of direct patient involvement through the use of self-report of disease activity, we undertook a systematic review of the literature and performed a metaanalysis to examine the degree of concordance between patients and trained assessors around this important clinical measure of disease activity in RA.
MATERIALS AND METHODS
Search strategy
Three search strategies were used to identify appropriate articles between 1966 and January 1, 2008. We searched PubMed, Embase, and the Cochrane databases by combining the following 3 searches with AND to identify all definitions of joint counts and self-report, as well as terms for RA: (1) rheumatoid arthritis; (2) joint OR joints OR articular OR disease activity; and (3) self-report* OR self-assess* OR self-monitor* OR self-administ* OR self-evalua* OR self-perce* OR self-examin* OR self-rate OR self-rating.
Bibliographies of all included studies were searched as well. A single investigator (JLB) performed the PubMed, Embase, and Cochrane Library searches and reviewed the titles and abstracts identified by the search.
Literature search limits and article selection criteria. Inclusion and exclusion criteria
Studies eligible for inclusion met the following criteria: (1) objective was to compare patient with trained assessor counts of joint swelling and/or tenderness and pain; (2) appeared in peer-reviewed publication; (3) included subjects with RA as determined by a board-certified or board-eligible specialist or as defined by 1987 ACR criteria11; and (4) were English language. Studies were excluded if they were letters to the editor, case reports, or review articles. Studies were also excluded based on abstract review if they did not meet inclusion criteria.
Abstraction of articles
Data abstraction was performed by 2 authors (JLB, RK). In situations where there was disagreement, an adjudicator (LAC) intervened. The following information was abstracted systematically from each article: author, year, country, setting (type of outpatient clinic or inpatient), study design, duration of study, ACR criteria for RA met, number of subjects, covariates included in study (age, sex, disease duration, inflammatory markers, medications, ethnicity, etc.), assessor type (physician, nurse, research assistant), format of joint count (mannequin or text), main outcome (correlation coefficient), association between correlation coefficient and other covariates, and predictors of concordance.
Quality assessment
Quality of included articles was assessed by ensuring that studies met the inclusion criteria listed above. These qualitative measures were chosen instead of a quality score because such scales have been shown to be incomplete and unreliable due to heterogeneity between studies12.
Statistical methods
The primary outcome measure was a summary correlation coefficient for both tender/painful joint counts and swollen joint counts. Given that studies used varying measures of correlation, we chose to calculate a summary Pearson correlation coefficient for tender/painful joint counts as well as one for swollen joint counts for all studies that reported a Pearson correlation coefficient. For those studies that reported Spearman correlation coefficients, we separately calculated the summary estimate for both types of joint counts as well. We were unable to include 3 studies in the metaanalyses that reported an intraclass correlation coefficient (ICC) as the measure of correlation13⇓–15. This precluded calculation of a summary ICC since, to our knowledge, there is no established method for performing metaanalysis using the ICC. Metaanalysis of correlation coefficients was conducted via methods described by Hunter and Schmidt16. Summary estimates were computed by weighting the sample size. We did not correct for any “design artifacts,” which are defined by Hunter and Schmidt as factors that may affect the size of the correlation coefficient, such as sampling or measurement error16. We produced, for each analysis, a summary correlation and confidence interval for the estimate. All analyses were conducted using R17. Interpretation of correlation coefficients was based on published thresholds18 as follows: r zero to 0.2, no correlation; 0.2 to 0.4, low correlation; 0.4 to 0.6, moderate correlation; 0.6 to 0.8, marked correlation; and 0.8 to 1.0, high correlation.
Because reliability of self-report joint counts could be influenced by mode of elicitation of joint symptoms, we performed a subgroup analysis by separately exploring correlations when the self-report was obtained using a mannequin format (Figure 1) from those correlations when self-report was obtained using a text format. We assessed for the presence of publication bias using the Begg’s test, which assesses the significance of the correlation between the ranks of effect estimates and the ranks of their variances; p value < 0.1 was considered significant19,20.
RESULTS
The PubMed search identified 364 articles and the Embase search identified an additional 98 for a total of 462 articles. The Cochrane Library search yielded no additional articles. Four hundred forty articles were excluded based on a review of the title and abstract, leaving 22 articles for full review of the text. After full review, 18 articles remained in the final analysis (Table 1). The 4 studies were excluded based on the following: (1) no direct comparison of patient with assessor joint count (n = 2); (2) duplicate patient population (n = 1); (3) lack of assessor joint count (n = 1).
Methodologic qualities of the studies
Methods across the 18 studies were heterogeneous. We abstracted data on blinding of assessors, type of joint counts performed (tender or swollen), format of self-report (mannequin or text), and measure of correlation between assessor and patient (Pearson, Spearman, or ICC). Of the 18 studies, 9 reported blinding of assessors, 15 reported tender joint counts, and 10 reported swollen joint counts. Three studies reported a combined tender/swollen joint count21⇓–23. Ten elicited self-report using text format and 8 using the mannequin. Three studies directly compared the 2 self-report methods13,24,25.
A minority of the studies described some features of validity of the self-report joint counts that included assessments of reliability, sensitivity to change, and correlation with acute-phase reactants. Seven studies reported on intrarater reliability (test and retest self-report)14,22,23,25⇓⇓–28 and all demonstrated moderate to high correlation. In terms of sensitivity to change, only 5 reports provided results, with the majority of these noting sensitivity to change for the self-report tender joint count, but not for the swollen joint count over time14,15,22,26,29. The range of time elapsed between joint assessments was 0.5 to 10 months, with 3–6 months being the average. Ten studies evaluated the correlation between self-report and assessor joint counts with an acute-phase reactant14,21⇓–23,27⇓⇓⇓⇓–32. In the majority of cases, these studies concluded that self-report joint counts do not correlate with acute-phase reactants and that self-report swollen joint count is not an adequate substitution for a physician examination to detect swollen or inflamed joints.
Summary correlation coefficients
The correlation coefficients between patient and assessor tender and swollen joint counts of the included articles are shown in Tables 2 (tender) and 3 (swollen). The 3 studies that reported on a combined tender and swollen joint count were not included in the summary results21⇓–23. The summary estimate for Pearson correlation coefficients for tender joint counts was 0.61 (0.47 lower, 0.75 upper) and for swollen joint counts, 0.44 (0.15 lower, 0.73 upper). The Spearman summary estimate for tender joint counts was 0.60 (0.30 lower, 0.90 upper) and for swollen joint counts, 0.54 (0.35 lower, 0.73 upper). Figure 2 displays Forrest plots for these summary estimates. These plots indicate moderate to marked correlation for tender joint counts, but only moderate correlation for swollen joint counts.
There was no evidence of publication bias for any of the 4 summary estimates using Begg’s test.
Overall, the range of correlation coefficients for tender joint counts fell into the moderate to high range, 0.45 to 0.92. The mannequin format yielded higher scores (0.54 to 0.92) compared to the text format for reporting tender joint counts (0.45 to 0.89). Overall, the swollen joint count correlation coefficients were lower, with a range of 0.16 to 0.67 (no to marked correlation). The mannequin format (range of 0.31 to 0.67) again fared better than text (0.16 to 0.63) for swollen joint counts. Overall, the mannequin format had higher correlation coefficients than the text format when the 2 were compared within and across studies.
Patient characteristics and predictors of patient-assessor differences
Only one of the 18 studies reported on predictors of patient-physician differences in joint counts25. Using regression analysis, the authors found that greater age, worse patient global disease rating, and poorer function (but not educational attainment) were significantly associated with discordance in the mannequin self-report format of tender joint counts. There were no important variables associated with differences in the mannequin swollen joint counts. With respect to the text format, longer disease duration predicted greater physician-patient differences in tender joint counts, and education level for swollen joint counts. For example, for patients with less than complete high school education the average percentage difference in swollen joint count scores was 21% and for those with more than a high school education, there was 0% difference25.
Six studies collected data on patient education level13,14,24,25,28,33. Three of these reported less than high school education in at least half or more of the subjects13,14,24. Two of these studies that included only tender joint counts using both mannequin and text formats reported moderate to high correlation (0.55 to 0.77) between self-report and assessors for both methods, and the third reported a marked degree of correlation for tender joint counts using a text method (ICC = 0.78), but low correlation for swollen joint counts (ICC = 0.31)14. Race/ethnicity data were collected in only 5 studies, with 4 of the 5 including 70% or more Caucasian subjects13,14,28,29,33. Nearly all the studies reported a majority (> 70%) of female subjects. Only one report highlighted the importance of literacy14. In that study, the mannequin format (ICC = 0.64) performed better than text (ICC = 0.55) for self-report of joint tenderness or pain.
DISCUSSION
Our systematic review and metaanalysis demonstrates that patient self-report tender joint counts have moderate correlation with those of a trained assessor or physician. Swollen joint counts demonstrate lower levels of correlation.
Since the 1950s, joint counts have served as an essential component of disease activity measurement in clinical trials, and less consistently in clinical practice, despite the fact that a joint count is part of the ACR Quality Indicator Set34. A self-report articular index could provide an efficient means to directly involve patients in the assessment of disease activity in RA and could benefit both patient and practitioner. Benefits include savings of time and expense, simplicity of use by the patient, and a reliable self-report measure to assess disease activity and response to treatment in clinical trials15,26.
While physician interpretation of tender or swollen joints may differ from that of the patient, one could argue that valuable information is derived from both observers and that one perspective is not “better” than the other. Reliable data on swelling and inflammation may best be gathered from the physician, but a more meaningful and accurate description of pain or tenderness can arguably be derived from the patient. As evidence that patient-reported outcomes are just as important, if not more important indicators of response to therapy, Ward35 assessed 14 different clinical measures (both physician and/or patient-reported, but no joint counts) in a group of 24 subjects with RA to determine which were most accurate and responsive to change. Of the 14 measures tested, patient and physician global assessments, patient pain score, and a patient self-report disability index were more sensitive to change over time than other measures (including physician-reported and laboratory measures). Three of these are obtained via patient self-report35.
While the effects of greater patient involvement in symptom reporting with respect to health outcomes have not been fully investigated in RA, self-management programs that target self-efficacy, informed decision-making, and communication in arthritis have been shown to reduce pain, fatigue, and health distress3. Another potential benefit of self-report joint counts in RA is the elimination of interobserver variation by removal of different physicians or nurses in the assessment of joints over time. And while our study demonstrated lower correlations on swollen joint counts, the study by Levy, et al demonstrated that a brief training session (5 minutes) helped patients discriminate between a truly swollen joint and a chronically enlarged joint and greatly enhanced the accuracy of a self-report swollen joint count36. It is also possible that a Web-based, touch-screen questionnaire could lead to the elimination of data entry, provide immediate access to results, and be an acceptable or even preferable option for patients to report joint counts as well as other measures32.
We found some evidence that education level, a potent marker for poor health communication, may not pose a barrier to accurately reporting tender joint counts, especially if the format used is pictorial (mannequin)13,24. When low literate and non-English-speaking subjects have been asked to recall medication regimens, such as warfarin doses, reporting was more accurate using a visual representation of their dosages, providing support for the notion that a mannequin or pictorial representation of joints may be more accurate across different language and literacy levels37. Athale, et al studied a Web-compatible instrument to collect self-report measures in RA (including painful and swollen joint counts) and found moderate to high correlation between paper and computer-reported measures using a mannequin format38.
The studies included in this review presented a number of weaknesses. As a whole, there was a lack of uniformity in several areas. The selection of a measurement tool to formally assess the joint count varied significantly among studies, including variability of number of joints counted, format of report, different counts between assessor and patient, and the combination of pain and swelling on some indices. The measure of correlation between assessor and self-report counts (Pearson, Spearman, or ICC) was not consistent across studies. The level of disease activity and the number of joints involved was for the most part low, but not uniform. And last, patient characteristics such as education level, disease duration, and level of knowledge about RA were rarely reported. The lack of consistency among studies was a limiting factor for the metaanalysis.
Our study has several limitations. First, the lack of conformity among studies with regard to patient characteristics did not allow a comprehensive determination of how they correlate with differences between patient self-report and assessor tender or swollen joint counts. It is possible that self-report joint counts of patients with more severe disease, longer disease duration, or greater disability may be systematically different than assessor or physician joint counts.
Second, because of a paucity of information from studies and the relative lack of diversity of enrolled subjects, we were not able to establish whether the reliability estimates are robust across education, age, race, language, or literacy levels. Future studies should include a diverse population to allow a broader and more generalizable application of a self-report joint count.
This systematic review and metaanalysis demonstrates that self-report tender joint counts in RA have moderate to marked correlation with those of a trained assessor or physician. Self-report joint counts have been incorporated into a patient-based disease activity score (the PDAS) that has been proven to be valid and sensitive, but has yet to be widely adopted10. Use of a mannequin self-report form (as in the PDAS) appears to enhance correlation. Swollen joint counts have at most moderate correlation between patients and assessors. Based on these results, a self-report tender joint count using visual aids, as proposed in the PDAS, may provide an efficient means to directly involve patients in assessment of RA activity. Studies are needed to assess the feasibility and effectiveness of incorporating a self-report disease activity score into an interactive, self-management program in clinical practice.
Acknowledgment
The authors thank librarian Gloria Won, University of California, San Francisco, for her contribution to the literature search.
Footnotes
Drs. Barton, Kaiser and Criswell’s work was supported by the Rosalind Russell Medical Research Center for Arthritis, Department of Medicine, University of California, San Francisco. Dr. Barton’s work was also supported by a Physician Scientist Development Award from the American College of Rheumatology Research and Education Foundation, and the NIH (T32 AR007304-28). Dr. Criswell’s work was also supported by the NIH (K24 AR02175). Dr. Schillinger was supported by an NIH Clinical and Translational Science Award UL1 RR024131.
- Accepted for publication July 8, 2009.