Abstract
Objective. To assess the reliability and clinically meaningful thresholds of intermittent and constant osteoarthritis pain (ICOAP) score, the Knee injury and Osteoarthritis Outcome Score Physical function Short-form (KOOS-PS), the Hip disability and Osteoarthritis Outcome Score Physical function Short-form (HOOS-PS), and the Quality of life subscales of HOOS/KOOS (HOOS-QOL/KOOS-QOL) in patients with knee or hip arthritis.
Methods. One hundred and ninety-five patients (141 knee, 54 hip) seen at 2 orthopedic outpatient clinics with a diagnosis of knee or hip OA completed patient-reported questionnaires (ICOAP pain scale, KOOS-PS, HOOS-PS, KOOS-QOL, HOOS-QOL) at baseline and 2-week followup. Reliability was assessed using intraclass correlation coefficients (ICC). We calculated minimum clinically important difference (MCID) and moderate improvement in the subgroup that reported change in the status of their affected joint.
Results. The reliability as assessed by ICC was as follows: ICOAP pain scale, 0.63 (0.48, 0.74) in patients with knee arthritis, and 0.86 (0.73, 0.93) for hip arthritis; KOOS-PS, 0.66 (0.52, 0.77); HOOS-PS, 0.82 (0.66, 0.91); KOOS-QOL, 0.79 (0.69, 0.86); and HOOS-QOL, 0.67 (0.42, 0.83). MCID and moderate improvement estimates in patients with knee arthritis were ICOAP pain scale, 18.5 and 26.7; KOOS-PS, 2.2 and 15.0; and KOOS-QOL, 8.0 and 15.6. A smaller sample in patients with hip arthritis precluded MCID and moderate improvement estimates.
Conclusion. We found that ICOAP pain and KOOS-PS/HOOS-PS scales were reasonably reliable in patients with hip OA. Reliability of these scales was much lower in patients with knee arthritis. Thresholds for clinically meaningful change in pain or function on these scales were estimated for patients with knee arthritis.
Recent efforts by 2 leading organizations, the Osteoarthritis Research Society International (OARSI) and Outcome Measures in Rheumatology Clinical Trials (OMERACT)1,2, have led to the development of new pain and function assessments for osteoarthritis (OA). These include the intermittent and constant osteoarthritis pain (ICOAP) score3 and short forms of 2 validated function scales — the Hip disability and Osteoarthritis Outcome Score Physical function Short-form (HOOS-PS) and the Knee injury and Osteoarthritis Outcome Score Physical function Short-form (KOOS-PS)4,5,6. These assessments are somewhat similar to the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC)7 and are being used increasingly in outcome studies on patients with OA.
Published studies have provided initial validation data for these instruments4,5,6. In addition, reliability and sensitivity to change data have recently been published. ICOAP was reliable with an intraclass correlation coefficient (ICC) of 0.85 in patients with knee/hip arthritis3 and ICC ranging 0.65–0.81 in patients who underwent knee/hip replacement surgery8. In 2 studies of patients who underwent knee/hip replacement surgery, ICOAP was responsive to change with standardized response means (SRM) ranging from 0.54–1.828 and 1.02–2.299, similar to other measures such as WOMAC9, higher for hip than knee replacement. The SRM for KOOS-PS and HOOS-PS ranged from 0.54–1.828 in patients who underwent knee or hip replacement surgery. However, none of the prior studies estimated clinically important change thresholds for these instruments. In addition, validation in US cohorts has not been done.
The primary objective of our study was to examine the test-retest reliability of ICOAP pain, HOOS-PS, and KOOS-PS questionnaires and the effect of age, race/ethnicity, and sex on reliability in a multicenter US study. We also assessed the thresholds for clinically important differences for these questionnaires in patients with knee or hip arthritis.
MATERIALS AND METHODS
Study population
Our study included patients recruited in 2 large medical centers (Veterans Affairs Medical Center, Minneapolis, Minnesota, and M.D. Anderson Cancer Center, Houston, Texas). Cohorts consisted of consecutive patients with a diagnosis of knee or hip OA who had radiographic evidence of hip or knee OA and were referred to orthopedic surgeons for consideration of joint replacement surgery. Patients were excluded if they had no knee/hip OA, prior knee or hip replacement, or concomitant inflammatory arthritis, or were unable to complete the questionnaire. These patients were recruited as part of an international multicenter study of patients with knee or hip pain. Details of the original study are described elsewhere10. As part of the original study, patients completed pain, function, and quality of life (QOL) assessments at the initial visit only. For our study, each patient who completed the baseline survey at the 2 US centers also received the same survey by mail at 2 weeks to test reliability and clinically important improvement thresholds. The study was approved by the institutional review boards at the Minneapolis VA medical center and the M.D. Anderson Cancer Center.
Validation and statistical analyses
All patients received a repeat survey 2 weeks after the first survey with the same questions as the first plus 2 additional questions. The first additional question was whether they had undergone a joint replacement in the joint for which they were evaluated. The second additional question was “Since the last time you completed the survey 2 weeks ago, would you say your hip (or knee) arthritis is: A great deal better, somewhat better, about the same, somewhat worse or a great deal worse?” Patients were asked to choose one of the response options. Patients were included in the analyses if they had not undergone joint replacement surgery, had answered the second question, and had returned their survey. Sensitivity analyses were performed in a subset that responded within 20 days of the first survey (an extra 6-day window was allowed for delay because of mailing time).
Each patient completed the following self-reported validated questionnaires: (1) knee or hip function assessment – either the HOOS-PS for hip or KOOS-PS for knee for function4,5,6, developed as short forms for assessments of physical function that has been translated into multiple languages11,12; (2) pain assessment with ICOAP score3,9 translated into other languages13,14,15; and (3) KOOS knee-related QOL, KOOS-QOL, and HOOS-QOL subscales of the original KOOS and HOOS questionnaires16,17. The score range for ICOAP pain, KOOS-PS, and HOOS-PS is 0–100, 100 being the worst. The score range for KOOS-QOL and HOOS-QOL is 0–100, 100 being the best. ICOAP pain questionnaire has 11 questions that are used to calculate the overall ICOAP pain; 5 questions related to constant pain and 6 questions to intermittent pain, which are used to calculate ICOAP intermittent pain and ICOAP constant pain scores. All 3 ICOAP pain scores ranged 0–100. ICOAP, KOOS-PS, and HOOS-PS were administered as complete instruments; KOOS-QOL and HOOS-QOL are in the 5 subscales of the original KOOS and HOOS questionnaires17,18,19 that were administered as part of the original study10, while the other 4 subscales were not administered to reduce patient burden and because of relevance of the original study.
Those 110 patients who reported their hip (or knee) arthritis being “about the same” were included for the reliability/reproducibility analyses. We used ICC to assess the correlation between baseline and followup assessments. Ninety-five percent CI and p values were presented. ICC was also calculated for patient subgroups by age, sex, and race/ethnicity as follows: (1) age group: < 65 versus ≥ 65 years old; (2) sex; (3) race/ethnicity: white versus nonwhite. We used 1-way ANOVA to compute the ICC and determine the between-subject variation and within-subject variation as a measure of test-retest reliability.
Patients who responded that their knee (or hip) arthritis was a great deal better or somewhat better constituted the datasets used for estimating minimum clinically important difference (MCID) for improvement (somewhat better), as recommended20,21,22, similar to previous studies23,24 and moderate improvement (a great deal better), respectively. This was calculated as the mean of the difference between baseline and followup score for each patient reporting that their arthritis was better since the last survey. We also calculated Minimal Detectable Change (MDC) using a statistical anchor to estimate meaningful change25. All analyses were done using SAS, version 9.3. Statistical significance was set at p < 0.05.
RESULTS
Study population
Clinical and demographic characteristics of the source and study populations are summarized in Table 1. Of the 107 patients in Minneapolis and 176 patients in Houston recruited for the original study that included the baseline survey10, 83 patients from Minneapolis and 112 patients from Houston (total 195 patients) returned the followup mailed surveys and constituted the analytic dataset (Table 1). Of those, 79 patients from Minneapolis and 71 patients from Houston returned their 2-week surveys within 20 days of the first survey and constituted the dataset for sensitivity analyses. Thus, our study included 195 patients: 141 had knee OA and 54 had hip OA. Four patients did not answer the patient global question, so were not eligible for reliability or sensitivity to change analyses.
The mean age of the patients was 61 years, 43% were female, 74% were white, 66% were married, and the mean body mass index was 33.9 kg/m2. The mean (SD) time to second survey completion and receipt was 17.6 days (± 6.8).
Compared to responders, nonresponders to the followup survey were younger (57 vs 61 yrs; p = 0.0065) but had no significant differences in marital status, education level, ethnicity, and employment status.
Clinically important change for improvement
For patients reporting change in knee or hip arthritis transition question, we calculated estimates for MCID and moderate improvement. The distribution of patients in these categories is presented in Table 2. The baseline and followup scores in patients who improved somewhat or a great deal, or were about the same are shown in Table 3. The MCID estimates for improvement in ICOAP pain, ICOAP constant pain, and ICOAP intermittent pain were 18.5, 18.7, and 18.4, respectively; respective moderate improvement estimates were 26.7, 29.6, and 24.3. For KOOS-PS, MCID and moderate improvements were 2.2 and 15.0 (Table 4). MCID and moderate improvement estimates for KOOS-QOL were 8.0 and 15.6, respectively (Table 4). Sensitivity analyses with 150 patients showed minimal changes (data available from authors on request).
Reproducibility and reliability
One hundred ten patients (81 with knee arthritis; 29 with hip arthritis) reported that their arthritis was about the same as at the time of the baseline survey. They constituted the analytic dataset for the assessment of reproducibility/reliability (Table 5). The ICC was 0.63 for ICOAP pain in patients with knee arthritis and 0.86 for hip arthritis. Respective ICC for ICOAP constant pain were 0.57 and 0.81 and for ICOAP intermittent pain were 0.64 and 0.83. ICC was 0.66 for KOOS-PS and 0.82 for HOOS-PS (Table 5). ICC for KOOS-QOL was 0.79 and for HOOS-QOL was 0.67.
Data on variation in reproducibility of ICOAP pain, HOOS/KOOS PS and QOL by age, sex, and race/ethnicity are available on request from authors. We noted variation in reproducibility in KOOS-PS by sex and race/ethnicity and by race/ethnicity in KOOS-QOL. In the hip cohort, variations in reproducibility for HOOS-PS and HOOS-QOL were noted by race/ethnicity (data available from authors). Sociodemographic and clinical characteristics of sensitivity cohort data are available from authors on request. Sensitivity analyses with 150 patients showed minimal changes compared to the main analyses of the cohort of 195 patients (data available from authors).
DISCUSSION
In this 2-center study of ethnically diverse US cohorts, we found that 3 assessments of pain and function, i.e., ICOAP, KOOS-PS, and HOOS-PS, were reproducible in patients with knee and hip arthritis. Reproducibility/reliability of ICOAP pain, HOOS-PS, and KOOS-PS was good in hip OA (0.82–0.88) and moderate in patients with knee OA (0.52–0.66). Reliability varied somewhat with age, sex, and race/ethnicity, as expected. Our study also provided reliability statistics for KOOS-QOL and HOOS-QOL scales. We also present estimates for MCID and moderate improvement for these scales in patients with knee arthritis.
The main finding from our study was that ICOAP, KOOS-PS, and HOOS-PS had moderate to good test-retest reproducibility. In a recent single center study in Europe that assessed test-retest reliability at 2 weeks in patients with OA who later underwent joint arthroplasty, ICC for ICOAP, HOOS-PS, and KOOS-PS scales ranged from 0.80-0.84 in patients with hip OA and 0.65-0.85 in knee OA, similar to WOMAC8. Our ICC were within this previously reported range; thus our multicenter study confirms this earlier finding in a more ethnically diverse population. The study cohorts were assembled similarly in the 2 studies. However, we had a racially/ethnically diverse population compared to the previous study, which might partially explain an ICC toward the lower end of the range in our knee cohort. Another potential reason for lower ICC in knee cohort may be a week-to-week variation in knee pain that is not determined in the global question asking about the worsening of arthritis. The ICC may also differ because of differences in prevalence of disease (which has been shown to affect ICC26) or because of random variation, given a small sample size for the hip cohort. The studies differed in that we analyzed only patients who reported no change in the status of their knee/hip arthritis between 2 visits (56.4% of the cohort) versus analyses of all patients in the previous study (because the transition question was not asked) for reliability assessment, a more conservative approach in our study that takes into account any change between assessments. In our study, reliability for KOOS-QOL and HOOS-QOL were 0.79 and 0.67, respectively, lower than the ICC of 0.89 reported for the Persian version of KOOS-QOL27. This may be due to minor content differences from translation or a difference in study populations.
We noted minimal variation in reproducibility for pain and minimal to moderate variation in reproducibility for QOL assessments by various patient characteristics including age, sex, and race/ethnicity. Most 95% CI were overlapping, signifying that these differences were not statistically significant in this small sample. This indicated the lack of difference or the lack of power to detect a small difference. These findings highlight the effect of patient characteristics on patient-reported outcomes. This was not unexpected and has been reported previously with Medical Outcomes Study Short-Form 36, with reliability ranging from 0.65 to 0.94 across patient groups28. However, to our knowledge, reliability variation has not been studied in the previous validation studies of most other instruments. One must also not overinterpret these findings, and they need to be reproduced. These findings also suggest that future studies reporting on validity of instruments should consider controlling the statistics by sex, race, and age, which could all provide useful guidance to the users of the instruments. This also raises the question of whether various validation characteristics of instruments, such as MCID and validity statistics that are usually reported for overall populations and not for individual subgroups, are more accurate and applicable to some patient subgroups than others. This is likely the case, because the overall average incorporates a range among respondents. This is a broad research agenda, not limited to these instruments, which needs more attention.
Our study provides estimates of clinically important improvement by estimating MCID and moderate improvement for these instruments in patients with knee OA, thus adding to the current knowledge; the hip OA sample was not large enough to perform analyses. ICOAP8,9, HOOS-PS, and KOOS-PS8 have been shown to be sensitive to change in previously published studies in patients who underwent knee or hip arthroplasty, surgeries demonstrated to be associated with significant improvements in pain and function after knee/hip joint replacement, similar to WOMAC. Thus, these measures have desirable psychometric properties.
Estimation of clinically meaningful changes is critical for validated instruments, because it provides guidance for calculating sample sizes for studies aimed at examining patient-relevant outcomes and comparing different interventions in patients with knee/hip OA or those undergoing arthroplasty, such as comparing arthroplasty implants or treatment pathways. Moderate improvement represents important change for the patients. For the knee cohort, moderate improvement was estimated at 27 units for ICOAP pain scale, 30 units for ICOAP constant pain, and 24 units for ICOAP intermittent pain. To our knowledge, this is the first study to provide MCID and moderate improvement estimates for these validated scales. We also provided estimates for MCID for KOOS-PS and KOOS-QOL. These thresholds represent meaningful changes that can be appreciated by patients as above and beyond the daily variation. Future studies should estimate MCID and moderate improvement for patients with hip OA; because only a few patients provided this information, we were unable to estimate those.
Our study findings must be interpreted considering the limitations. Patients were recruited from orthopedic offices where they were assessed for joint replacement surgery, and these estimates may be different for populations with milder arthritis or other causes of knee pain, and in younger patients. Our study was not powered to examine differences in reliability by patient characteristics (age, sex, and race) and therefore we may have missed small differences. Despite a reasonable study sample size, we had a small sample for estimating clinically important differences, especially for moderate improvement estimations (standard errors were large), and for assessing reliability in the hip cohort. Although similar sample sizes have been used to derive reliability and validation statistics29,30,31, our confidence in these estimates is not high. Also, a small sample did not allow us to examine MCID and moderate improvement thresholds by categories of baseline scores. More studies are needed to confirm our estimates in larger groups of patients. Study strengths include that this was a multicenter study, we recruited a significant number of minority patients and women, and we assessed instruments that are relevant to patients with OA.
We found that in the hip cohort, ICOAP pain, HOOS-PS, and HOOS-QOL had moderate to high reliability. Reliability was moderate for ICOAP pain and KOOS-PS and high for KOOS-QOL in the knee cohort. Our study provided estimates for clinically meaningful changes for improvement for these assessments, which can be used to calculate sample sizes for future randomized and cohort studies and allow comparison of different treatment modalities/implants. Future studies should assess whether our estimates for clinically meaningful changes in patients with knee or hip arthritis are stable across other populations.
Acknowledgment
The authors thank the patients for participation in the original study.
Footnotes
-
The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the United States government.
-
Supported by US National Institutes of Health Clinical Translational Science Award 1 KL2 RR024151-01 (Mayo Clinic Center for Clinical and Translational Research), and the resources and facilities at the Birmingham VA Medical Center, Alabama, USA. The original study was supported by unrestricted grants to scientific societies (Osteoarthritis Research Society International and Outcome Measures in Rheumatology) from pharmaceutical companies: Pfizer, Expansciences, Novartis, Negma Lerads, Rottapharm, Fidia, and Pierre Fabre Santé laboratories, and a research grant from the European League Against Rheumatism. Dr. Suarez-Almazor is the recipient of a K24 award from the National Institute for Arthritis, Musculoskeletal, and Skin Disorders (AR053593). Dr. Singh has received research and travel grants from Takeda and Savient, and consultant fees from Savient, Regeneron, Takeda, and Allergan.
- Accepted for publication October 8, 2013.