Abstract
Objective. To examine the influence of different analytical methods, baseline covariates, followup periods, and anchor questions when establishing a minimal important difference (MID) for individuals with knee osteoarthritis (OA). Second, to propose MID for improving and worsening on the Knee injury and Osteoarthritis Outcome Score (KOOS).
Methods. Retrospective analysis of prospectively collected data from 272 patients with knee OA undergoing a multidisciplinary nonsurgical management strategy. The magnitude and rate of change as well as the influence of baseline covariates were examined for 5 KOOS subscales over 52 weeks. The MID for improving and worsening were investigated using 4 anchor-based methods.
Results. Waitlisted for joint replacement and exhibiting unilateral/bilateral symptoms influenced change in KOOS over time. Generally, low correlations between anchors and KOOS change scores limited calculations of MID; thus, they were only proposed for the pain, activities of daily living, and quality of life subscales. The method used to calculate the MID influenced the cutpoint; however, the type of anchor question only influenced the MID when analyzed with a particular mean change method. Depending on patient and clinical characteristics, the subscale, and the analytical approach used, the MID for KOOS improvement ranged from an absolute change of −1.5 to 20.6 points and worsening ranged from −19.17 to 8.5 points.
Conclusion. MID vary with patient and clinical characteristics, KOOS subscale, and analytical approach. Provided the anchor question is relevant to the patient-reported outcome and baseline status is considered, the anchor does not appear to influence the MID for improvement or worsening when using some anchor-based methods.
Patient-reported outcomes (PRO) are a valuable source of information used to monitor people with knee osteoarthritis (OA). One challenge in applying PRO in OA treatment settings is interpreting changes in scores over time1,2,3. This challenge has sparked the pursuit for the minimal important difference (MID) as well as insight into measurement error when PRO are applied in different patient cohorts, disease states, and treatment settings.
No best practice approach exists for determining the MID. Since first described by Jaeschke, et al4, recommendations regarding the nomenclature and definition of what constitutes clinically important change have evolved considerably. For example, while terminology was originally introduced as the “minimal clinically important difference”4, the “clinically” was subsequently removed to minimize the focus on the clinical arena and to focus on the patients’ experiences instead5. Further, definitions have been summarized as weighing change in health status against the risk of undertaking the intervention, change that would alter the patient’s care, and change of a sufficient magnitude to be perceived as important to the patient1,6. Despite these different definitions, the common concept is that the MID is the lowest boundary of change determined to be important in some way. The patient remains at the center and so many believe that an MID is valid only if it is anchored to patients’ perception of health6,7.
The Knee injury and Osteoarthritis Outcome Score (KOOS) is a PRO for patients with knee injury and OA. Its responsiveness to change has been demonstrated in a diverse range of patients and pathologies8,9,10. Consisting of 5 subscales, each scored from 0–100 — pain, other symptoms, activities of daily living (ADL), sport and recreation, and knee-related quality of life (QOL) — higher scores indicate less pain and disability.
The minimal detectible change (MDC) and standard error (SE) of the measure have been examined for each of the KOOS subscales in knee OA cohorts11,12. However, the MID for improvement and worsening are yet to be empirically established in cohorts undergoing nonsurgical interventions. These realities make KOOS an ideal model for examining the process of establishing a MID for a PRO that is primarily used for individuals undergoing nonsurgical knee OA management. In doing so, the model permits discussion of issues and complexities involved with defining clinically meaningful change, including determining whether single or multiple methods for establishing a MID is more appropriate, and whether baseline covariates, time, or the type of anchor question influences the result. The aim of our study was to undertake this examination, and in doing so to propose potential MID for the KOOS subscales.
MATERIALS AND METHODS
Ours was a retrospective analysis of prospectively collected data from participants enrolled in the Osteoarthritis Chronic Care Program (OACCP) at a teaching hospital in New South Wales (NSW), Australia. The OACCP was developed by the Agency for Clinical Innovation as a model of nonsurgical management for individuals with hip and knee OA. The OACCP was a 12-month program focusing on self-management, timely access to information, and management from a multidisciplinary team. All patients received individualized care plans that were reviewed at about 12 weeks, 26 weeks, and 52 weeks. While the OACCP served as a clinical program, the NSW Population and Health Services Ethics Committee approved the use of patient data for research purposes (2012/08/413), and patients were informed of this at the time of enrollment.
Inclusion criteria
Individuals with medically diagnosed hip or knee OA were eligible to participate in the OACCP. The face validity of a doctor diagnosis for the presence of OA has previously been demonstrated13. Individuals can be referred to the program by general practitioners, orthopedic surgeons, or rheumatologists. In addition to OA, a minimum of 30/100 pain severity in the signal joint on most days of the month was required for inclusion. While there were no exclusion criteria, patients with severe comorbidities may not complete the walking test. In addition, patients with dementia were excluded from PRO and were thus not included here. Because we were interested in changes in KOOS over time in individuals with knee OA, only data from patients managed for unilateral or bilateral knee OA who had completed their 26-week review were included in the analysis. Of the 280 participants meeting these criteria, data from 272 individuals were analyzed because 6 participants had not completed any KOOS subscale. Of the 191 participants who had completed the 52-week assessment (55 individuals were lost to followup), 7 had not completed the KOOS. Thus, data from 184 individuals were retained (Table 1).
Collected data
Age at entry, unilateral or bilateral pain, and whether the patient was waitlisted for joint replacement were recorded at baseline. At baseline and each followup, the KOOS, index joint pain severity, and level walking test score were recorded. Pain severity over the previous week was measured on a 0–100 mm, worst to best, visual analog scale. Level walking was measured using a 6-min walk test (6MWT) in which the patient was asked to walk as quickly as possible back and forth over a 25-m straight flat walking area for 6 min and distance was recorded in meters.
It has been suggested that global transition scales, or anchors, examining both a patient’s index joint and general health were required because they measure different aspects of a patient’s outcome14. At all followups, patients were asked: “Compared with when I started this program, my walking on level ground has…” and “Compared with when I started this program, my knee has…” Patients responded on a 7-point Likert scale: “much improved,” “moderately improved,” “slightly improved,” “not changed,” “slightly worse,” “moderately worse,” and “much worse.”
Statistical analysis
Each KOOS subscale was considered separately. Analyses were conducted in RStudio (Version 0.98.1091, RStudio Inc.). First, 4 linear mixed models examining maximum likelihood estimation were fitted to each subscale. This approach was chosen because of its flexible assumptions, ability to increase the complexity of the model with each iteration, and ability to cope with changes in sample size from baseline to 52 weeks. The first 2 models examined whether the magnitude of change in KOOS (dependent variable) differed at each followup period. The third model examined whether the rate of change in KOOS significantly differed between each followup period. These models provided insight into whether changes in the KOOS could be combined across all followup periods or whether each followup period needed independent examination. Adjustment for baseline status was viewed as critically important when making interpretations regarding the MID2,7,15,16. Thus, the final model examined the effect of baseline covariates on an individual’s baseline KOOS, magnitude, and rate of change. Baseline characteristics (sex, age, pain severity, 6MWT, waitlisted for joint replacement status, and unilateral/bilateral symptoms) were added to the model using a forced entry method. Variables that improved the fit of the model using log likelihood, Akaike information criterion, and Bayesian information criterion were retained.
Once the appropriate model for each subscale was established, KOOS change scores were calculated by subtracting baseline KOOS from their respective subscale at each followup. Spearman rank correlations were then examined between the anchors and the absolute KOOS change scores. Similar to the first 3 linear mixed models, this analysis addressed whether time to followup influenced the MID. It also served to determine whether the type of anchor question was an important consideration when establishing an MID. Correlation coefficients higher than between 0.3 and 0.5 have been proposed as suitable between the anchor or measurement instruments7,17. Therefore, an a priori threshold of 0.4 was set as the minimum correlation. Only the KOOS subscales that met this minimum criterion underwent further analysis for a MID.
The MID for both improvement and worsening in each subscale was calculated using receiver-operating characteristic (ROC) curves and mean change methods2. The ROC method is increasingly used to ascertain the MID because it uses all data, which increases the precision and accuracy of estimates18. The improvement MID, or minimum clinically important improvement (MCII), classified participants who indicated they had slightly, moderately, or much improved as the “improved group” whereas those registering no change or worse were considered the “non-improved group”. Similarly, for the minimal clinically important change for worsening (MCIW), those who reported slightly, moderately, or much worse were classified as the “worse group” and those reporting no change or improved as the “not-worse group”. The area under the curve (AUC) and 95% CI were calculated as an estimate of each anchor’s ability to predict changes on the KOOS. The respective AUC were then compared using DeLong statistics to determine whether 1 anchor was a superior predictor19.
The primary ROC approach for determining an appropriate MID was the Youden method20. In this method, the point of the ROC curve that maximized the distance to the identity line was selected as the optimal MID17,21. The secondary method was the 80% specificity rule, whereby the MID is the best sensitivity for response while still achieving at least 80% specificity22. These approaches were chosen because the former provides maximum accuracy at predicting a MID by any combination of sensitivity and specificity22, whereas the latter provides the optimal MID while ensuring that at least 80% or median number of true negatives are correctly classified22,23. CI were established by drawing 500 stratified bootstrap samples2,19.
Two mean change methods were investigated. The first method was as it was originally proposed by Jaeschke, et al4, whereby the MID for improvement and worsening were calculated as the mean change in scores over time within the subgroup of participants who reported they were slightly improved or slightly worse. The second was as proposed by Redelmeier and Lorig24, where the mean change score for the subgroup identifying as “no change” was subtracted from that of the groups identifying as slightly improved or worse. CI for both methods were calculated as 1.96 multiplied by the SE2.
RESULTS
Description of the study population
Based on the rules for missing items in the KOOS25, baseline subscale scores could be calculated for 98.9% of pain and other symptoms, 100% of ADL, 69.85% of sport and recreation, and 89.7% of knee-related QOL. The number of missing subscale scores reduced over time; the sport and recreation subscale had the most missing items at every timepoint (Table 1).
The highest percentage of responses for the walking on level ground anchor at 26 weeks and 52 weeks was “much improved” (27% and 32%, respectively), and only 2 patients reported they were “much worse”. This pattern of responses was mirrored when patients were asked about their knee health (Figure 1). In general, patients reporting themselves as improved also reported better KOOS at followup; those who were symptomatically worse exhibited worse scores. However, there was significant overlap between categories on both anchors (Figure 2).
KOOS changes over time and influence of baseline covariates
With the exception of the sport and recreation subscale, the rate of change significantly fluctuated between timepoints (Supplementary Data is available from the authors on request). Thus, subsequent MID analyses treated each followup in isolation (baseline to 12 weeks, baseline to 26 weeks, and baseline to 52 weeks). Pain severity and waitlisted for joint replacement influenced baseline KOOS for all subscales, with those reporting worse pain and those on the waiting list for joint replacement independently reporting worse scores. Two subscales exhibited significant interactions between predictors and time. Individuals with bilateral knee OA reported 2.39 points (95% CI 0.76–4.01) greater improvement in the pain subscale over 52 weeks than their unilateral counterparts. Individuals who were not waitlisted for joint replacement exhibited 1.93 points (95% CI 0.23–3.64) greater improvement over 52 weeks on the QOL subscale. All baseline covariates except sex and unilateral/bilateral symptoms were significantly correlated; however, all correlation coefficients were small (0.1 < r < 0.3; data not shown)26.
Correlations with anchor scales
Despite highly significant correlations between both anchors and the majority of KOOS change scores at followups, there were few instances where correlations exceeded an absolute value of 0.4 (Table 2). Correlations exceeding this threshold, and subsequently entered into the MID analysis, were the change in the pain subscale from baseline to 26 weeks for those with bilateral knee OA and the 52-week change scores for those with unilateral knee OA, the change scores obtained from baseline to 52 weeks for the ADL subscale, and QOL subscales from baseline to 52 weeks only for patients waitlisted for joint replacement.
Influence of anchor question on MID
The AUC indicated that both anchors were fair to good predictors of change in KOOS subscales (Table 3) with no statistical difference between their predictive ability (Figure 3). Further, the MID established using the ROC methods were comparable between the anchor questions (Table 3), with the exception of the MCIW for pain at 52 weeks where the difference was 11 points. The comparability between anchors was also observed using the Jaeschke, et al4 mean change method, where the largest discrepancy between anchors was exhibited for the MCII of the pain subscale at 26 weeks (4.04 points). In contrast, using the Redelmeier and Lorig24 method, MID proposed for pain and QOL subscales at 26 weeks and 52 weeks exhibited differences ranging from 6.61 points to 12.64 points between anchor questions.
MID thresholds
The range of MCII thresholds were comparable between the analytical methods when considered across all subscales. Of the mean change methods, the Jaeschke, et al4 method produced the highest MCII of 20.6 points (8.91–31.21) for the pain subscale for bilateral patients at 26 weeks and the Redelmeier and Lorig24 method produced the lowest MCII at −5.04 points (−23.06 to 12.98) for the 52-week QOL subscale (waitlisted for joint replacement patients only). Of the ROC methods, the 80% specificity produced the highest and lowest MCII: 18 points for the pain subscale at 26 weeks for patients with bilateral knee symptoms and 1.5 points for the ADL subscale at 52 weeks.
For the MCIW, the lowest thresholds depended on the analytical method used. Mean change-based MCIW ranged from −19.17 points (−31.96 to −6.38) when using the general knee anchor on the QOL subscale at 52 weeks for those who were waitlisted for joint replacement, to 3.64 points (−0.99 to 8.28) when using the walking anchor for the pain subscale at 26 weeks for bilateral knee pain patients. ROC curve-based methods resulted in MID ranging from −7 points for pain at 52 weeks for those with unilateral symptoms to 8.5 points for pain at 26 weeks. Importantly, all but 3 MID established using the Redelmeier and Lorig24 method and 10 of the 16 MID established using the Jaeschke, et al4 method exhibited CI that included 0, indicating that some participants reported no change or change in the opposite direction on the KOOS compared with the anchor questions. Similarly, 10 of the 16 MID established using the 80% specificity ROC exhibited CI for their sensitivity that were < 50%, indicating that these thresholds were possibly no better than chance at correctly classifying individuals who reported improvement or worsening.
DISCUSSION
The importance of defining meaningful change in outcomes used in clinical decision making and clinical trials is widely accepted. The objectives of our study were to examine the complexities of establishing a MID in a knee OA population undergoing nonsurgical multidisciplinary management as well as to understand its influences. To do this, we used the KOOS as a model for PRO. The initial modeling determined that the magnitude and rate of KOOS change over time depended on patient and clinical characteristics and differed for each subscale. Incorporating these findings and the analytical method, the MID for KOOS improvement ranged from an absolute change of −1.5 to 20.6 points and worsening ranged from −19.7 to 8.5 points. This highlights a growing consensus among clinimetric research: there is an inability to apply a single MID across all populations and followup periods.
Our study suggests that the influence of specific anchor questions on MID is controversial and may depend on the PRO/subscale examined and analytical method. While the correlation coefficients and AUC of the subscales that underwent MID analysis in our study are considered acceptable27,28, this was not the case for all the subscales and timepoints assessed. Even in the subscales where the correlation to the anchor questions met our a priori threshold, there were several instances where the groups rating themselves as “slightly improved” or “slightly worse” exhibited greater KOOS change than those in the moderate categories. This confirms that the constructs measured by a single-item question and those measured using a multidimensional scale are not identical. These findings reinforce the importance of establishing the correlation between the anchor and PRO prior to any MID analysis and limited the number of MID that could be proposed in our current study.
Of the subscales assessed, the anchors used by the OACCP produced similar MID for pain and ADL when either ROC method or the Jaeschke, et al4 approach were used. Further, both the AUC and DeLong statistic indicated the anchors had similar predictive value for KOOS change on these subscales. Therefore, it appears these analytical methods support the findings of Tubach, et al3, whereby provided the MID is relevant, the actual anchor has a low effect on the definition of success in clinical trials. However, Redelmeier and Lorig’s24 mean change method resulted in substantial differences (1.29 to 12.64 points) between MID proposed for the same subscale when different anchors were used. This questions the application of MID established using methods that are highly influenced by the anchor question when there are alternative methods available, unaffected by this issue.
Regardless of the anchor, the method used to calculate the MID will result in different values2,7. For both the pain subscale at 26 weeks and QOL subscale at 52 weeks, there was up to a 16-point difference between MICW and an 11.5-point difference between MCII proposed using different analytical methods. Despite this, of the 64 proposed MID, 58 did not exceed the ± 13 points required to exceed the MDC obtained from comparable cohorts to our current study11,12. Therefore, the applicability of the proposed MID for KOOS is questionable. It may be that any change beyond measurement error represents clinically meaningful change. Further validation is required to confirm this.
The differences exhibited between the MID analytical methods also suggest the use of each method needs to be considered when applied in clinical and research settings. An advantage of applying MID calculated from ROC analytical methods is that they remain the same when applied to groups and individual patients because ROC curves are a diagnostic analysis technique21. In contrast, MID calculated from the mean change methods are only applicable at group levels29. This implies that ROC methods may have greater use in clinical settings whereas either method could be used for research purposes investigating group effects. However, in our analysis, ROC curve MID exhibited lower sensitivity than specificity. This means they performed better when identifying patients who need more or different interventions compared with those responding well to the current strategy. To gauge the most precise distribution of patients’ outcomes, researchers and clinicians would need to be able to calculate and apply both the MCII and MCIW. Mean change methods avoid this double analysis issue because they only handle data from individuals who are slightly improved or worse, rather than all data. This carries its own limitations because sample sizes may be small, leading to low power and unreliable MID. This was the case in our analysis, as evidenced by wide 95% CI. Therefore, while the triangulation of multiple MID approaches is recommended2,3,7,30, our findings suggest that consideration of the context and intended interpretation of the thresholds are required to maximize power and generalizability.
In addition to the low number of reliably determined MID, our study has several limitations. First, we used absolute change scores rather than relative change. The latter is frequently used to adjust for baseline covariates. Our choice of approach is based on the potential of a ceiling effect. When the PRO is designed for higher scores to indicate better health status, using percent change scores increases the association with baseline scores, inflates the MID, and risks eliminating true differences in meaning along with the potential error1,28,31. There is no consensus on the optimal adjustments for baseline covariates when establishing a MID and using absolute change may not be the optimal approach. However, by isolating analysis based on covariates that influence KOOS change over time and only proposing MID for subscales that were moderately correlated with the anchor, we believe that we appropriately adjusted for factors that may influence baseline differences.
Limitations of all MID research is the presence of negative values for MCII and positive values for MCIW, as occurred in the current and previous studies32. Further, while the magnitude of improvement or worsening is quantified, rarely is there an attempt to clarify whether this magnitude of change is important to the patient. Rather, as was the case in our study, “importance” is assumed to be implicit whenever a response category other than “not changed” is selected. It is possible for individuals with chronic conditions to recalibrate their perception of their condition and the importance of change over time, or “response shift”33. It is also possible that a recall bias has occurred, as commonly occurs in PRO34. We did not investigate either of these issues, and we did not determine which category on the anchor question should be consistently used for the MID. These issues need further investigation and could be as simple as performing a “then-test” and asking patients if their reported change is important to them.
Our study is the first, to our knowledge, to focus on the issues and recommendations for evaluating the MID for the KOOS in a cohort of individuals with knee OA undergoing nonsurgical management. Patient and clinical characteristics, specifically time to followup and clinical disease severity as measured by laterality of symptoms and being waitlisted for joint replacement, affected the MID for the KOOS. This further confirms there is no such thing as 1 MID for a specific PRO. Additionally, the association between the anchor and PRO appeared to fluctuate over time and limited the MID’s clinical usefulness. The result was a limited series of MID for pain, ADL, and knee-related QOL at 26 weeks and 52 weeks that is lower than previously reported MDC. Thus, for a change in the KOOS to be interpreted as clinically meaningful, it needs to exceed both the MID and MDC. Because of the discrepancies in MID thresholds resulting from different analytical approaches, triangulation of multiple methods continues to be more appropriate than using a single method to establish an MID. However, clinicians and researchers are urged to think about the context in which the MID will be applied when determining which methods to triangulate. Future research needs to validate this finding, clarify “important change,” and account for recall bias and response shift. By establishing and using MID, clinicians and researchers will be better served to interpret the meaning of PRO.
Acknowledgment
Data used in the presented analysis were obtained with the permission of the Agency for Clinical Innovation (ACI). The authors acknowledge the work of the ACI’s Musculoskeletal Network in developing and supporting implementation of the Osteoarthritis Chronic Care Program in New South Wales. We thank the ACI Musculoskeletal Network, especially Robyn Speerin, for assistance with accessing data. We are very grateful to Matthew Williams for assistance with Royal North Shore Hospital data collection.
Footnotes
Jillian Eyles receives funding from a Ramsay Health Care Allied Health Scholarship and a Royal North Shore Hospital Staff Specialist Award. Professor Hunter is supported by a National Health and Medical Research Council Practitioner Fellowship.
- Accepted for publication November 10, 2015.