Abstract
Objective. To determine the content validity, the construct validity, and the responsiveness of the Dutch McMaster Toronto Arthritis Patient Preference Questionnaire (MACTAR) in patients with osteoarthritis (OA) of the hip or knee.
Methods. The MACTAR comprises 2 parts: a transitional part and a status part. Content validity was investigated by comparing patient-elicited activities to items on the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the Medical Outcomes Study Short-Form 36 (SF-36). Construct validity was determined by correlating MACTAR outcomes with WOMAC/SF-36 outcomes. Responsiveness was investigated by correlating MACTAR, WOMAC, and SF-36 change scores with patient global assessment (PGA) scores and plotting a receiver-operating characteristics (ROC) curve.
Results. Eleven percent of the 894 impaired activities, identified by 192 patients, were not represented in either the WOMAC or the SF-36. The correlations (rs) investigated for the MACTAR transitional part varied between 0.27 and −0.40; the status part correlated moderately with the general health scale of the SF-36 (rs = 0.44). MACTAR change scores correlated better with PGA than with WOMAC/SF-36 change scores. The area under the ROC curve amounted to 0.90.
Conclusion. Our results suggest that the MACTAR exhibits moderate construct validity and good responsiveness in a population of patients with OA of the hip or knee. The MACTAR is potentially better able to detect changes over time in activities that are important to individual patients compared to other tools measuring physical function (WOMAC, SF-36). Clinicians could use the MACTAR to evaluate clinically relevant changes over time in patient-specific physical functioning.
Osteoarthritis (OA) is a common chronic musculoskeletal disorder1, which can result in moderate to severe limitations in physical functioning2. Limited physical functioning can lead to a diminished quality of life3,4,5. OA treatment guidelines recommend exercise therapy to reduce impairments in physical function due to OA6,7. Exercise therapy can thus enable individuals to better meet the demands of daily living8,9,10,11.
A number of tools are available to clinicians to evaluate the effect of exercise therapy on physical function. General, disease-specific, and patient-specific tools can be applied as either (self-reported) questionnaires or performance-based tests. A systematic review of the psychometric quality of both questionnaires and performance-based tests in patients with OA of the hip or knee has been published12,13. The reviews recommended the application of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC)14, the Medical Outcomes Study Short-Form 36 (SF-36)15,16,17, and multiactivity tests when evaluating physical function in patients with OA12,13.
Standardized tools, applied to all patients in an identical manner, are recommended for the evaluation of physical functioning. Data produced by these tools may be conveniently and relatively easily categorized and compared between patients and across settings18. However, standardized tools are often difficult to interpret at the individual level and fail to take account of individual preferences and variation in the performance of particular activities18. Patient-specific tools measuring physical function have been developed based on the need for a more patient-centered approach as set out in healthcare policies and to enable clinicians to measure changes in activities that really matter to individual patients19. In contrast with standardized tools, patient-specific instruments can identify the relevant issues at an individual level and allow evaluation to focus on what is important to each individual patient18. Although the possibilities to compare statistical data between patients are minimal, the application of patient-specific tools may improve the validity and responsiveness for the assessment of physical function18,19.
The McMaster Toronto Arthritis Patient Preference Disability Questionnaire (MACTAR) is one example of a patient-specific scale measuring physical function20,21 (Appendix). The objective of the MACTAR is to identify individual disabilities due to the disease and their relative importance to the patient, complemented by questions on general health status20. The MACTAR has been described as a highly responsive and valid tool for the evaluation of physical function in patients with rheumatoid arthritis (RA)21. A recent psychometric evaluation of the questionnaire in patients with chronic lower back pain and patients with systemic sclerosis (SSc) showed moderate correlations with general and disease-specific tools that measure physical function22,23.
To enable clinicians to use the MACTAR when evaluating physical function in patients with OA of the hip or knee, the psychometric properties of the questionnaire in this specific population must be determined. Therefore, our objective was to determine the content validity, construct validity, and responsiveness of the MACTAR in patients with OA of the hip or knee.
MATERIALS AND METHODS
Study design
Data reported in this study were collected from a cluster-randomized controlled trial of 200 patients with OA of the hip or knee over a 12-week period (maximum 18 sessions) that compared behavioral graded activity with usual care in accord with the Dutch physical therapy guidelines24. The content of the interventions has been described elsewhere24. The Medical Ethics Committee of the VU University Medical Center, Amsterdam, approved the study. For the purposes of this validation study, data on “physical function” were used, as well as descriptive data on the study population.
Study population
Participants were recruited between November 2001 and May 2003 through participating physical therapists and local newspapers. Dutch-speaking patients with OA of the hip or knee (based on the criteria of the American College of Rheumatology25,26) aged between 50 and 80 years who experienced diminished physical function were included in the study24. Participants who completed both baseline and followup (Week 13) measurements were eligible for inclusion in the present psychometric evaluation.
Measurements
Demographic and clinical data
Demographic and clinical data, including age, sex, duration of symptoms, OA location, and OA grade according to Kellgren and Lawrence27, were collected from participating patients.
Physical function
Dutch MACTAR: The objective of this interview-based measurement tool is to evaluate changes in patient-specific physical function over time. It comprises 2 parts. The baseline interview starts with a transitional part. In this part, a trained interviewer asks the patient to identify up to 10 activities in which he/she experiences difficulties because of OA, such as activities in domestic care, professional life, and social interaction. The identified activities are ranked by the patient from 1 to 10 in order of importance: 1 for the activity the patient most wishes to be able to do without pain or discomfort due to OA, 2 for the next most important activity and so on. The top 5 prioritized activities are evaluated at followup. The second part of the MACTAR (status) collects information on health status. Perceived overall health, as well as psychological, emotional, and social well-being is measured by 5 questions (Likert-type rating scale); when a question obtains a less than optimal score, a followup question probes whether this is due to OA.
At the followup interview (Week 13), changes in physical function are investigated. Patients evaluate progress on their 5 most important activities as indicated in the transitional part of the baseline interview, by evaluating each activity as “less of a problem” (3 points), “the same” (2 points), or “more of a problem” (1 point). Patients also rate the perceived change in their OA on a 7-point Likert scale. The status part reassesses patients’ health status.
It is quite difficult to allocate a total score to the MACTAR tool, because each part measures different domains. Moreover, the transitional part and the status part employ different scoring methods. While the transitional part measures change in physical function between baseline and followup, the status part investigates current health status, at both baseline and followup. The scoring method is presented in Table 1. Because of the differences between the transitional and status parts, scores were not added together, but presented separately. The MACTAR was translated into Dutch by Verhoeven, et al and validated in a population with RA21.
Dutch WOMAC: The physical function subscale of the WOMAC contains 17 items that represent common activities in daily living14,28. Patients are asked how much difficulty they have had performing the activities mentioned. Each item is scored on a categorical scale, from “no difficulty” (score 0) to “extreme difficulty” (score 4). The total score varies from 0 (no difficulties) to 68 (extreme difficulties). Change scores on the WOMAC physical function subscale can vary between −68 (maximum improvement) to +68 (maximum deterioration). The WOMAC has been shown to be reliable and valid in patients with OA of the hip or knee14,28 and the Dutch WOMAC permits valid international Dutch-English comparisons after differential item functioning28.
Dutch SF-36: The SF-36 investigates quality of life15,16,17,29. It comprises 8 subscales, 3 of which were used in this validation study: physical functioning, role-physical, and general health. Scores on each subscale range from 0 to 100; higher scores reflect better health status. The SF-36 has been validated for patients with various diagnoses, including OA30; the Dutch language version has proved to be practical, reliable, and valid for use in general population surveys29.
Self-perceived change: At the followup evaluation, self-perceived change in physical function was assessed by a patient global assessment (PGA) score. Patients were asked to rate their overall perception of improvement since the start of the intervention on a scale ranging from 1 (vastly deteriorated) to 8 (completely recovered). PGA scores provide reliable assessments of health transition in people with musculoskeletal disorders31.
Statistical analyses
Descriptive statistics were applied to describe the study population. PGA ratings were dichotomized as “improved” (PGA score 5, 6, 7, or 8) versus “not improved” (PGA score 1, 2, 3, or 4). For continuous data, independent t tests were used to calculate differences at baseline between those patients who improved and those who did not. For categorical data, Mann-Whitney U tests were used to compare between groups.
Content validity
Content validity examines the extent to which the domain in question is comprehensively represented by the items in the questionnaire32,33. To determine whether the items in the MACTAR refer to relevant aspects of the construct and are relevant to the purpose of the instrument, the impaired activities mentioned by patients were compared with items on the WOMAC and the SF-36 physical functioning subscale34.
Construct validity
There is currently no “gold standard” for attributes such as disability and functional status35,36,37,38. Therefore, construct validity rather than criterion validity was assessed. Construct validity refers to the extent to which scores on a particular instrument relate to other assessment tools in a manner that is consistent with theoretically derived hypotheses39.
To investigate the construct validity of the MACTAR in patients with OA, change scores on the transitional part of the MACTAR were correlated with change scores on both the WOMAC and the SF-36 physical function subscales, as well as the SF-36 role-physical subscale. Further, followup scores on the status part of the MACTAR were correlated with followup scores on the SF-36 general health subscale. For normally distributed data, Pearson correlation coefficients (r) were used to express these correlations40. Spearman’s rank correlation coefficients (rs) were applied when data were not distributed normally. The following hypotheses were tested: (1) The change score on the physical function subscale of the WOMAC is negatively correlated (rs ≤ −0.5; p < 0.05)41 with the change score on the transitional part of the MACTAR; the correlation was expected to be negative, because the WOMAC and the MACTAR use reverse scales. (2) Change scores on the physical functioning and role-physical subscales of the SF-36 are positively correlated (rs ≥ 0.5; p < 0.05)41 with the change scores on the transitional part of the MACTAR. (3) Followup scores on the general health subscale of the SF-36 are positively correlated (rs ≥ 0.4; p < 0.05)41 with the followup scores on the status part of the MACTAR.
Responsiveness
Responsiveness can be assessed in many different ways. However, one can distinguish 2 definition groups42. The first describes responsiveness as “the ability to detect clinically important change”42,43. In this group, an instrument is indicated as high-responsive if it is able to distinguish real change from measurement error. Responsiveness is calculated as the magnitude of a treatment effect in which the standardized response mean (SRM) and effect size could be very useful42,43. The second group defines responsiveness as “the ability to detect changes over time in the construct to be measured”34,42. In this case, responsiveness is independent from any treatment effect and is interpreted as longitudinal validity. It should be assessed in analogy to construct validity34. Therefore, predefined hypotheses concerning change scores on the MACTAR, WOMAC, and SF-36 in relation to PGA scores were tested. In the case of normally distributed change scores, parametric statistics were applied; nonparametric variants were applied for data that were not distributed normally. It was hypothesized that (1) the correlation between change scores on the MACTAR (transitional part) and the PGA will be better than that between change scores on the PGA and the WOMAC physical function subscale, the SF-36 physical functioning subscale, and the role-physical subscale, respectively (p < 0.05). (2) Change scores on the MACTAR (transitional part) for patients who have improved according to PGA will differ significantly (p < 0.05) from change scores for those who have not improved according to PGA.
Second, responsiveness was determined by plotting a receiver-operating characteristics (ROC) curve. The first step in this construction was to calculate sensitivity and specificity statistics for MACTAR change scores in patients identified as improved (PGA score > 4) and patients identified as nonimproved (PGA score ≤ 4). Next, the true-positive rate (sensitivity) was plotted in functions of the false-positive rate (1 – specificity) for different cutoff points. The best possible cutoff point would yield a point in the upper left corner of the ROC space, representing 100% sensitivity, 100% specificity, and an area under the curve (AUC) of 1.0. An instrument is indicated as highly responsive if AUC is > 0.90, moderately responsive where the AUC is between 0.70 and 0.90, and lowly responsive if the AUC is between 0.50 and 0.7044.
All analyses were performed using PASW Statistics 18.0. If patients were unable to identify at least 5 impaired activities on the transitional part of the MACTAR, missing activity scores were filled with a score indicating a “no-change situation” (2 points); data from patients who mentioned fewer than 3 impaired activities were excluded from the responsiveness analyses. Further, in cases of just 1 missing followup item for the status part of the MACTAR, the score obtained on the equivalent question in the baseline interview was also used for the followup.
Following the initial analyses, a sensitivity analysis was performed on various cutoff points of the dichotomized PGA score, the aim of which was to determine whether the chosen cutoff point was the optimal point to dichotomize.
RESULTS
Study population
A total of 192 patients participated in both the baseline and the first followup assessment and were included for content and construct validity analyses. The median PGA score of these 148 women and 44 men was 5, representing “slightly improved.” Baseline characteristics of the study population are presented in Table 2.
Outcomes
Table 3 shows absolute scores on the MACTAR, WOMAC, SF-36, and PGA at baseline and followup for both the total population and improved/nonimproved patients. At baseline, there were no differences on any of the outcome measures between patients who indicated that they had improved and patients who indicated that they had not improved. At followup, MACTAR scores (both transitional and status parts), WOMAC physical function scores, and SF-36 physical functioning scores differed significantly between improved and nonimproved patients. The measurement variation was higher in the WOMAC and SF-36 compared with the MACTAR, at both baseline and followup (Table 3).
Content validity
The study population (n = 192) identified a total of 894 impaired activities, a mean of 4.6 impaired activities per patient. Seventy-one patients (37%) were unable to identify at least 5 impaired activities; 1 patient could name only 1 impaired activity; 10 patients identified 2 impaired activities; and 33 patients were able to name a maximum of 3 impaired activities. Walking was most frequently mentioned as the most impaired activity (43%). Overall, 72% of the impaired activities that were identified comprised activities in the category of mobility. Table 4 summarizes all the activities mentioned, ranked by category.
All items from both the WOMAC and the SF-36 physical function subscales were represented in the impaired activities list based on the MACTAR questionnaire. However, 27% of the activities mentioned by patients during the MACTAR interview were not represented in the WOMAC, and 41% were not represented in the SF-36. Eleven percent of the impaired activities mentioned were not covered by items of the WOMAC or the SF-36: examples of these include gardening and activities related to professional life.
Construct validity
Correlations (rs) between change scores on the transitional part of the MACTAR and change scores on the physical function subscales of the WOMAC and the SF-36 were −0.40 (p < 0.01) and 0.27 (p < 0.01), respectively. Change scores on the transitional part of the MACTAR and the role-physical subscale of the SF-36 were also moderately correlated (rs = 0.27, p < 0.01). Spearman’s rs between followup score of the MACTAR status part and the general health subscale of the SF-36 was 0.44 (p < 0.01; Table 5).
Responsiveness
Data from 133 patients (82% women, mean age 64.0 ± 8.1 yrs) were used in the responsiveness analyses. Seventy-seven percent of these patients indicated that they had improved following treatment (PGA score > 4), while 23% reported that they had not improved (PGA score ≤ 4). With the exception of age, the improved and nonimproved groups had similar baseline characteristics. Absolute change scores on physical function outcomes are presented in Table 6. Change scores for patients who indicated that they had improved differed significantly from patients who indicated that they had not improved on all outcome measures (Table 6).
Correlations between change scores on the physical function outcomes and the PGA score are also presented in Table 6. As hypothesized, change scores on the MACTAR correlate better with PGA (rs = 0.69) than change scores on the WOMAC (rs = −0.39) and SF-36 (rs = 0.26 and 0.25, respectively; Table 6).
Figure 1 presents an ROC curve of the change scores for the MACTAR (transitional part), in which the sensitivity of the MACTAR amounted to its 1 – specificity. The AUC was 0.90 (95% CI 0.89–0.96) with a standard error of 0.03.
The sensitivity analysis showed that the cutoff point for the PGA score dichotomization (> 4) was chosen correctly. Higher and lower cutoff points resulted in less optimal responsiveness values.
DISCUSSION
Our aim was to investigate the content validity, construct validity, and responsiveness of the MACTAR in patients with OA of the hip or knee.
The content validity of the MACTAR seems to be good. Specifically, the majority of the impaired activities identified correlate with items on the WOMAC and/or SF-36, which also aim to assess physical function. However, the MACTAR fits better with the WOMAC questionnaire than with the SF-36. This is not surprising, since the WOMAC is aimed specifically at patients with OA, whereas the SF-36 has a more general purpose. Data for 11% of the activities are gathered only by the MACTAR and are not represented in either the WOMAC or the SF-36. These comprised activities in the areas of leisure, professional life, and social interaction. Indeed, participation in these fields varies widely among individuals. Disease-specific and general instruments do not take account of individual limitations, but patient-specific measures such as the MACTAR allow clinicians to evaluate physical functioning at the individual level.
The majority of the activities identified by the MACTAR questionnaire comprised activities in the mobility domain, which corresponds with the majority of activities in daily life. Recent validation studies on the MACTAR questionnaire in patients with chronic low back pain and RA showed comparable results21,23. The most frequently mentioned impaired activity in patients with chronic low back pain was taking part in sports activities23; in patients with hip/knee OA, walking was the most commonly cited impaired activity.
Although the content validity of the MACTAR seems to be good in patients with OA, the construct validity is less convincing. Moderate associations between the transitional part of the MACTAR and presumed comparable outcomes (rs ≤ 0.40) might be explained by an unbalanced distribution of impaired activities across the various activity categories. Specifically, the mobility category comprised almost 72% of all reported impaired activities, whereas the mobility domain in the WOMAC contains only 58% and in the SF-36, 60% of the total questionnaire. Thus, the transitional part of the MACTAR covers one specific part of the physical function domain extensively, whereas disease-specific and general tools account for a broader spectrum of this domain. Another explanation for the moderate construct validity could be the narrow variance around the mean on the MACTAR, compared with a wide variance in WOMAC and SF-36 scores. The variance is caused by patients who tend to assign the same disability score to very different impaired activities. The difference in variance impedes a comparison between a patient-specific instrument on the one hand and a disease-specific/generic instrument on the other.
As hypothesized, the status part of the MACTAR was moderately correlated with the general health subscale of the SF-36 (rs = 0.44). Previous studies identified comparable correlation coefficients between the MACTAR and other physical function measures. Sanchez, et al23 found a correlation (rs) of 0.40 between the MACTAR and the Quebec Back Pain Disability Scale37 in patients with chronic low back pain, and a correlation (rs) of 0.38 (p = 0.002) was found between the MACTAR and the Health Assessment Questionnaire (HAQ)45 in patients with SSc22. Verhoeven, et al21 showed a correlation coefficient (r) of 0.73 (p < 0.0003) between the MACTAR and the HAQ in patients with RA.
The MACTAR was developed to evaluate patient-specific physical function over time. With this goal in mind, responsiveness is the most important psychometric property. For that reason, we evaluated the responsiveness of the questionnaire. As hypothesized, change scores for the MACTAR correlated better with the PGA score than change scores on the WOMAC and SF-36 do, leading to the conclusion that the MACTAR is better able to detect changes over time in patients with hip/knee OA than the WOMAC or SF-36. It has also been demonstrated that the MACTAR is capable of distinguishing patients who reported an improvement from those patients who reported no improvement. An AUC of 0.90 confirms the high responsiveness of the MACTAR in patients with hip/knee OA. Verhoeven, et al21 also investigated the responsiveness of the MACTAR, concluding that it showed a high degree of responsiveness, based on an SRM of 3.5. However, an SRM is not an appropriate measure of assessing responsiveness as the ability to detect changes over time in the construct to be measured, but can be used to detect clinically important change34,42.
One limitation of our study is the use of PGA scores as an external criterion to distinguish patients who improved from those who did not. Guyatt, et al46 showed that patients are unduly influenced by their current health status when they complete transition ratings such as PGA scores. Moreover, the reproducibility of a single-item transitional scale is probably lower than that for a more extended measurement tool47. Finally, “a little better” is not, as a matter of course, equivalent to an important change48. However, better external criteria to discriminate between improved and nonimproved patients have not yet been elaborated.
Although the MACTAR appears to have some advantages over the WOMAC and the SF-36 in assessing physical function in individual patients, it also has some limitations. The need for a trained interviewer to apply the MACTAR, as well as its complicated scoring method, may reduce the likelihood that the MACTAR will become the instrument of first-choice in clinical practice. Further, patient-specific measures, including the MACTAR, do not take account of shifts in patient priorities that can occur over time in cases of change in disease status. Therefore, further studies should take account of the application of patient-specific measures at longterm followup.
Our results suggest that the MACTAR exhibits moderate construct validity and good responsiveness in a population of patients with OA of the hip or knee. Further, the MACTAR is potentially better able to detect changes over time in activities that are important to individual patients compared to other tools measuring physical function (WOMAC and SF-36). Therefore, clinicians could use the MACTAR to evaluate clinically relevant changes over time in patient-specific physical functioning.
APPENDIX. McMaster Toronto Arthritis Patient Preference Questionnaire (MACTAR). From J Rheumatol 1987;14:446–50, with permission
- Accepted for publication January 12, 2012.