Abstract
Objective. To test the hypothesis that interventions targeting the relief of pain and disability in musculoskeletal diseases may have differential effects on activity limitation and participation restriction as defined in the International Classification of Functioning, Disability and Health (ICF).
Methods. Full data were obtained for 3 randomized controlled trials that used the Western Ontario and McMaster Universities Osteoarthritis Measure (WOMAC), the Medical Outcomes Study Short-form 36 (SF-36), or the Oswestry Disability Questionnaire as their primary outcome measures. The trial outcomes were reanalyzed using items previously designated as assessing pure activity limitation (A) or participation restriction (P), or a mixture of the 2 (A/P) only, and the results compared with the outcomes found using the full scales, which assess a mixture of outcome domains.
Results. The results did not refute the hypothesis. An exercise-based intervention and injection therapies both appeared to have more effect on participation restriction (P) than on activity limitation (A), while a drug-based intervention had more effect on A than on P.
Conclusion. Different interventions used to treat musculoskeletal disorders may have differential effects on impairment, activity limitation, and restricted participation. The use of outcome measures that do not differentiate these 3 domains may obscure the true value of an intervention.
- INTERNATIONAL CLASSIFICATION OF FUNCTIONING, DISABILITY AND HEALTH
- OSTEOARTHRITIS
- LOW BACK PAIN
- DISABILITY
The World Health Organization (WHO) International Classification of Functioning, Disability and Health (ICF)1 identified 2 main parts. The first includes Impairment, e.g., pain; Activity, e.g., climbing stairs; and Participation, e.g., socializing. The second part includes contextual factors, both environmental and personal. Although Activity and Participation were viewed as conceptually different in earlier versions of the model, in the final version they were combined because of difficulty in differentiating these concepts1,2. The ICF model is widely used in exploring the consequences of prevalent health conditions such as osteoarthritis (OA)3,4, low back pain5, and sciatica6,7. Many different instruments are used for the assessment of health outcomes, and rules that link their items to the ICF categories are established and widely used8,9. Categories of the ICF have also been used in the development of comprehensive and brief core sets for many health conditions10,11,12, and in the identification of commonalities among core sets of related diseases13.
Instruments used for the assessment of OA and low back pain, like many other instruments, may be measuring the same or different aspects of health outcomes. Even a single question may measure more than 1 domain of the ICF model. For example, in the Western Ontario and McMaster Universities Osteoarthritis (WOMAC) instrument14, the question “How much pain do you have standing upright?” may measure a mixture of Impairment and Activity Limitation, while another question, “What degree of difficulty do you have standing?”, may measure pure activity limitation and none of the other domains. These possibilities were investigated using discriminant content validation methods in a recent study15. The study examined a range of instruments that are commonly used to assess health, including WOMAC14 and the Medical Outcomes Study Short-form 36 (SF-36) health questionnaire16. A further measure, the low back pain Oswestry Disability Questionnaire (ODQ)17,18, was also investigated by the same authors using the same methodology (unpublished data). In these studies, expert judges classified individual items (questions) within the instruments as assessing 1 or more of the main ICF domains, i.e., each item was classified as measuring a pure domain (Impairment, Activity Limitation, or Participation Restriction) or a combination such as Activity Limitation and Participation Restriction. The agreement of judges on the classification was assessed by the intraclass correlation coefficient and found to be 0.95 for WOMAC, and 0.94 for both SF-36 and ODQ, suggesting very high agreement15.
It is important to use a reliable and valid instrument in trials, to accurately measure the effectiveness of an intervention. We aimed to investigate whether measures that assess the individual domains of the ICF model are more sensitive than the conventional measures that assess a mixture of the different domains. We aimed to test the hypothesis that different types of interventions might have different effects on impairment (I), activity limitation (A), and participation restriction (P). If true, then using global measures that include items from different domains may be inappropriate, and inadequate in inferring the true efficacy of different interventions.
MATERIALS AND METHODS
This is a secondary analysis of data from 3 previously published randomized controlled trials (RCT) conducted in the UK. Two were done on patients with knee OA and 1 on patients with sciatica. The 3 studies were chosen on the basis of convenience and availability of raw data. In each trial, 1 or 2 of the outcome measures (WOMAC function, SF-36 physical function, and ODQ) were used for the reassessment of the effect of the different interventions. The 3 measures included mostly pure A and P items or a combination of A/P, hence the investigation was confined to these and do not include I. The chosen outcome measures had previously been judged on whether they fit with the ICF structure15. For example, the 17 items of WOMAC function were reclassified as a combination of 12 pure Activity Limitation (A) items, and 5 questions judged to assess a mixture of Activity Limitation and Participation Restriction (A/P; Table 1). We compared the originally reported outcomes using total WOMAC function (17 items) with the newly defined subscales of 12 pure items on A and 5 mixture items (A/P). Summation over the appropriate items was used. Details of the classification of items for the outcome measures were given in Table 2 of Pollard, et al15.
The 3 trials examined the efficacy of a range of interventions including a pharmacological package, specific medical procedures and injections, and physiotherapy.
The KIVIS Study (knee intraarticular therapy trial)
This was a randomized, controlled, single-blind, dual-center trial of 150 patients, aged between 40 and 90 years, with knee OA. Patients were randomized to 2 parallel groups. Seventy-one received tidal irrigation (TI) using a 3.2 mm arthroscope under local anesthesia, and 79 received an intraarticular injection of corticosteroid (SI)19. The original study used the visual analog scale (VAS) of WOMAC function, in which responses are measured on a 0–10 scale, and expressed as percentages. The final score was obtained using the standard procedure, adding up all items. A single score of WOMAC function was calculated at 2, 4, 12, and 26 weeks and compared with the baseline score for each group. We further used 2 subscales to assess changes in A only and changes in A/P separately using similar procedures; t-test was used for all comparisons.
The Wessex Epidural Steroids Trial (WEST)
In this trial, 228 patients aged 18–70 years with a clinical diagnosis of unilateral sciatica (1–18 months duration) were randomized to either placebo (injections of 2 ml of normal saline into the interspinous ligament) or a lumbar epidural injection of 80 mg triamcinolone acetonide and 10 ml of 0.25% bupivacaine corticosteroid at Weeks 0, 3, and 6. The patients were assessed at 0, 3, 6, 12, 26, and 52 weeks. The primary outcome measure was the ODQ. Secondary outcomes included the SF-36 physical function subscale and other walking and climbing time measures20. We reexamined changes in A only and in A/P based on the SF-36 physical function subscale and changes in A and P only based on the ODQ subscales15 for the 2 groups. For the ODQ, scores for A only and P only were calculated for each domain using the standard formula used for the full scale, and changing the number of items as appropriate for each subscale; t-test was used for all comparisons, as no difference in results was reported where other regression methods were attempted.
The TOPIK trial (Treatment Options for Pain in the Knee)
This was a pragmatic multicenter RCT undertaken in 15 general practices in North Staffordshire, UK21. Participants were 325 adults aged ≥ 55 years (mean 68 years) consulting with knee pain; 297 (91%) reached a 6-month followup. Interventions were enhanced pharmacy review (pharmacological management according to an algorithm); community physiotherapy (advice about activity and pacing and an individualized exercise program); and control (advice leaflet reinforced by telephone call). The main outcome measure was change in WOMAC function at 3, 6, and 12 months (Likert version). Similar procedures were adopted in this investigation to examine changes in A only and A/P only items.
Analysis
Analysis was by intention to treat (ITT) as adopted in the original studies. All outcome measures were treated as last observation carried forward so that data were available for every subject at each timepoint, wherever that was possible. Estimates of the treatment effects with 95% CI, based on t-test or regression methods, were used as appropriate. The same procedures were used for full scale and subscales at all times. Original findings based on the global outcome measures were reproduced, and in addition the newly defined subscales such as A only, P only, and A/P were then used as outcome measures with the same statistical techniques. For all comparisons the effect size was calculated as the difference between means/combined SD.
Results for the newly defined subscales of A only, P only, and A/P are presented with the original results based on full scales in Table 2, with subsections (1), (2), (3), and (4) for the different scales examined. The results in Table 2 (2) are based on SF-36 physical function scores using standard procedures16 for the full 10-item scale. We adjusted the number of items as appropriate for the A and A/P subscales. We used the t-test to compare change in score in the 2 groups.
RESULTS
KIVIS [Table 2 (1)]
The original study reported reduction in pain and improvement in function in both groups at Weeks 2 and 4, but no significant difference in benefit between the 2 interventions. At Weeks 12 and 26, the benefit of SI decreased while that of TI was maintained, and this difference was significant at the 5% level. When we examined A (only items based on WOMAC function scale in comparison with A/P items), the improvement in A/P from TI was more pronounced than that on A only. The effect size for A only was 0.01, 0.04, 0.32, and 0.38 at Weeks 2, 4, 12, and 26, respectively, and the corresponding effect sizes for the A/P items were 0.08, 0.16, 0.38, and 0.51. This suggests that TI might have affected participation restriction more than activity limitation.
The WEST Study [Table 2 (2 and 3)]
We examined changes in the SF-36 physical function score for the placebo and the lumbar epidural corticosteroid (ESI) treatment groups at 3, 6, 12, 26, and 52 weeks. All 10 items of the domain were used as a global measure. A only (7 items) and A/P (3 items) were also used separately. The SF-36 physical function as a global score did not detect a significant response to treatment compared to placebo, as reported in the original study20. The change in SF-36 physical function scores, however, suggested an improvement in physical function for the 2 groups at all assessments. This improvement was greater for the ESI group than for placebo in the first 2 assessments but smaller after that (Weeks 12, 26, and 54). When the subscales A only and A/P were examined separately, the placebo group did better in A only items at all assessments (−0.03, −0.01, −0.15, −0.11, and −0.11), while the ESI group did better than placebo for A/P items in all assessments including the first, where the 2 groups showed some decline (0.06, 0.25, 0.05, 0.05, and 0.12). The magnitude of decline in the first assessment was smaller for the ESI group. These results suggest that while the global scale showed no change and some inconsistency, the subscales have detected benefit for the treatment on A/P. While the effect size for A only was negative in the 5 assessments, the corresponding effect sizes for A/P items were all positive, pointing to the benefit of the treatment over placebo. The original study showed that when ODQ was used as a global outcome measure, there was a transient benefit for ESI over placebo at 3 weeks but no difference after that. When the measure was reexamined as A only and P only items, the effect size for P items was found to be slightly larger than that for A in the first 2 assessments, suggesting a possible differential effect of the active intervention, in agreement with the finding in the SF-36 physical function scores.
The TOPIK Trial [Table 2 (4)]
The study addresses the hypothesis that different types of intervention have a different effect on A versus P, as there are 2 quite different interventions: enhanced pharmacy and physiotherapy. The original study21 showed a significant improvement in WOMAC pain scores for both the pharmacy and the physiotherapy groups compared with control at the 3-month assessment, but no significant difference for any of the interventions at the subsequent 6-month and 12-month assessments. For WOMAC function, the original study showed a significant improvement in the scores for the physiotherapy group at 3 months, and a slight but not significant improvement for the pharmacy group.
When we reexamined the outcome measure as subscales A and A/P, at 3 months there was a positive effect from pharmacy on both A and A/P, with a larger effect size for A than for A/P, while the effect of the physiotherapy was larger for A/P than for A. The effect of the pharmacy, however, decreased after 3 months, and the control group did better at 6 and 12 months. At both assessments the difference between the 2 groups was very small and not significant, and the difference on A/P was negligible. On the other hand, the effect of physiotherapy was higher on A/P at 6 months, and that was maintained at 12 months, suggesting a consistent improvement on P (although not significant at the 5% level) at both occasions.
DISCUSSION
We examined the differential effect of a range of interventions on different domains of health as identified by the ICF model1, using 3 outcome measures: SF-36 physical function, WOMAC function, and ODQ. Each item of these instruments was classified as assessing 1 or more domains, i.e., each item was classified as measuring a pure domain (impairment, activity limitation, or participation restriction), or mixed, such as activity limitation and participation restriction15. The study supports the hypothesis that different interventions may differentially affect different health domains. Although OA and back pain with sciatica were the focus of this study, the principle of examining different domains of health probably relates to many common musculoskeletal conditions.
The original KIVIS study showed positive effects for TI over corticosteroid injection, with TI significantly improving pain and function. When we examined the subscales of WOMAC function, the effect size for A/P items was larger than that for A only items, suggesting the possible effect of improvement in pain on participation. A differential effect was also noted in the TOPIK study, in which physiotherapy had a larger effect on participation restriction, while enhanced pharmacy seemed to improve activity more. Similarly in the WEST study, using A only items from the SF-36 physical function subscale detected benefit from the epidural steroid injection (ESI) over placebo, which was not apparent when the full physical function scale was used. Although not significant at the 5% level, these results are of interest since the positive effect size for A/P, as opposed to negative effect size for A only items, provides some evidence for a differential effect. The ESI appears to have improved participation but not activity. Thus, combining subscales that measure different domains may result in inconsistencies that disappear once they are treated separately. In such situations, pooling items could result in cancellation of opposite sign estimates (negative and positive) and eventually any effect would disappear or at least get diluted in the final total figure. Thus true treatment effects might be masked by using global outcome measures. The original WEST study showed a significant improvement in pain in the first assessment, and a slightly better score was also reported for the (ESI) group after that, supporting the possibility of better ability to participate. The possible relationships between the interventions investigated and the ICF domains are described in Figure 1.
Interest in outcome measures has occupied a central position in many fields of clinical and methodological research, and various issues have been addressed. The contents of measures and their suitability to assess different health conditions has been questioned by Hunt and McKenna22, and their responsiveness to change was highlighted by Ware, et al23. Methodological problems that arise in attempting to use simple algebraic summation to aggregate items that may measure different domains of health has been explored by many researchers24,25,26. Methods such as factor analysis and principal components are widely used to identify underlying structures of many health outcome measures. An example is the General Health Questionnaire27,28. The complexity of these methods, however, and occasional inappropriate use have resulted in much confusion and inconsistency in the inferences drawn29. Latent class models that solve many of the methodological problems of these classical methods were recently developed30,31.
In addition to methodological issues, however, these statistical methods do not readily match measures to the theoretical domains of the ICF, but attempt to empirically identify dimensionality, and these methods cannot map measurement items to the theoretical domains such as those identified in the ICF. An alternative is the linkage method suggested by Cieza, et al. For example, they have indicated methods of linking OA measures to the ICF categories2,32,33. However, they have not attempted to separate A and P, finding this too difficult using the linkage method. In our study, the discriminant content validation methods involving experts’ judgment rather than mathematical models, used to identify items as uniquely measuring I, A, or P, or any combination of these (Pollard, et al15), were numerically investigated. Our results show that it is indeed possible to find significant differences in attributing items to A or P15, and the results suggest that separating A and P may be important in investigating the effectiveness of different interventions. The view that A and P are combined, suggested by the ICF categories, was often challenged by researchers. Instead of linking items to the ICF categories, the hypothesis that the 2 concepts are distinct was tested. In a study assessing physical function, on community adults, for example, 3 underlying structures were identified, and the authors concluded that combining A and P might not be appropriate34.
Our study addressed the hypothesis that different types of intervention used to treat OA or sciatica might have differential effects on activity limitation and restricted participation. Drugs were found to affect activity limitation more than participation restriction, and exercise and injection have more effect on participation restriction than on activity limitation. Different methods of classification would have obtained different conclusions. For example, based on the ICF categories, A and P would have been treated as 1 component. Based on Badley35, what was defined as acts such as walking and tasks such as dressing would have been considered different, and different conclusions would have been reached. The results from our study, although not conclusive, may help answer interesting questions raised by Johnston and Pollard36 regarding the ability of existing measures to separate individual structures of health, and may provide evidence for the effectiveness of interventions that was not detected by global outcome measures. Such evidence may be of value in advising treatment and rehabilitation programs. Clinical implications, however, are yet to be defined.
Limitations of our study include the fact that we were restricted by the outcome measures used in the original studies. We examined only WOMAC function, SF-36 physical function, and ODQ, and the latter 2 measures were used in just 1 study each. It is important to examine a wider range of outcome measures, especially those that contain many items for each domain including I, and are likely to measure distinctive outcomes, such as the disease-specific Oxford knee and Oxford hip scores. It will also be important to examine studies with longer followup. We examined data from an RCT comparing home exercise with no intervention in men and women aged ≥ 45 years with knee pain37, in which the first assessment took place at 6 months, and no differential effect was found. Another limitation may be that in all the studies examined, there was either no effect or only a very small effect for the intervention. Studies with larger effect would perhaps help to better detect how different interventions might differentially affect health domains.
Acknowledgments
We thank Prof. Elaine Hay and Dr. Elaine Thomas for commenting on an early draft and providing the raw data of the TOPIK trial. We also thank Dr. Kim Thomas and Dr. Isabel Reading for providing data for KIVIS, the WEST, and the knee pain trials. Thanks also to the “Mobile” team (Bristol) for comments and suggestions at early presentations of relevance to this study.
Footnotes
-
Supported by the MRC Health Services Research Collaboration, Department of Social Medicine, University of Bristol.
- Accepted for publication April 15, 2010.