Abstract
Objective We applied a precision medicine–based machine learning approach to discover underlying patient characteristics associated with differential improvement in knee osteoarthritis symptoms following standard physical therapy (PT), internet-based exercise training (IBET), and a usual care/wait list control condition.
Methods Participants (n = 303) were from the Physical Therapy vs Internet-Based Training for Patients with Knee Osteoarthritis trial. The primary outcome was the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) total score at 12-month follow-up. Random forest-informed tree-based learning was applied to identify patient characteristics that were critical to improving outcomes, and patients with those features were grouped.
Results Age, BMI, and Brief Fear of Movement (BFOM) score, all at baseline, were identified as characteristics that effectively divided participants, creating 6 subgroups. Assigning treatments according to these models, compared to assigning a single best treatment to all patients, resulted in greater improvements of the average WOMAC at 12 months (P = 0.01). Key patterns were that IBET was the optimal treatment for patients of younger age and low BFOM, whereas PT was the optimal treatment for patients of older age, high BFOM, and BMI (kg/m2) between 26.3 and 37.2.
Conclusion These results suggest that easily assessed patient characteristics including age, fear of movement, and BMI could be used to guide patients toward either home-based exercise or PT, though additional studies are needed to confirm these findings. (ClinicalTrials.gov: NCT02312713)
Knee osteoarthritis (OA) is one of the most common causes of pain and disability.1 Exercise-based therapies, including physical therapy (PT), are considered core treatments for patients with knee OA.2,3 However, patients vary considerably in their level of improvement following exercise-based interventions, and very little is known about drivers of this variability. This limits our ability to make patient-centered recommendations about specific exercise interventions.
Precision medicine is a field that focuses on assigning the optimal treatment specifically for each patient. In the context of precision medicine, various machine learning methodologies can produce collections of decision rules tailored to the inherent features of individual patients. The treatment regimen that is customized for each patient is related to the heterogeneity of patient characteristics, including demographic or clinical features. Common machine learning approaches in the precision medicine field include Q-learning, outcome weighted learning, and list-based treatment rules.4 Although there have been multiple studies using machine learning methods in the context of OA, these have primarily involved observational studies focused on predicting progression and other outcomes including gait decline and joint movement.5-8 To date, there has been little application of precision medicine–based machine learning in the context of clinical trials of OA management. One previous study found that in the context of a clinical trial comparing exercise, dietary weight loss, and their combination, the combination intervention was optimal for most participants, but a subgroup of participants characterized by high baseline weight or low waist circumference, without a history of myocardial infarction, would have had more benefit from initial assignment to diet only.9 In the present study, we add to this literature by exploring characteristics underlying differential improvement among participants in the Physical Therapy vs Internet-Based Exercise Training for Knee OA (PATH-IN) trial, which compared standard PT with internet-based exercise training (IBET), both relative to a usual care/wait list (WT) control group. Previously, our group applied qualitative interaction trees, a sequential partitioning method, and generalized unbiased interaction detection and estimation, a regression tree approach, to evaluate heterogeneity of treatment effects at the short-term follow-up timepoint (4 months) in PATH-IN.10 We now extend this work by focusing on longer-term (12-month) outcomes, which are important for understanding maintenance of treatment effects.
Another way in which we extend prior work in this area is through development of a novel machine learning approach. Although existing machine learning approaches enable individualized treatment assignment, they are often uninterpretable regarding underlying mechanisms. To address these limitations, we developed a new machine learning algorithm that produces mechanistic decision rules that distinguish between subgroups of patients. This new algorithm, random forest (RF)–informed tree-based learning, enables the final decision rule (regarding optimal treatment assignment) to determine the patient characteristics that most strongly influence the outcome and identify the thresholds of those characteristics to split the patients for assignment. In an iterative fashion, the algorithm identifies a subgroup of patients that could most benefit from a specific treatment and searches for more detailed rules consisting of successively finer subgroups of patients in pursuit of the largest average benefit for the target population. We applied this methodology to data from the PATH-IN trial and obtained decision rules regarding the treatment from which each patient may expect the greatest improvement in OA symptoms and function at 12-month follow-up.
METHODS
PATH-IN trial. The PATH-IN trial (ClinicalTrials.gov: NCT02312713) included 350 participants with symptomatic knee OA; details of the methods and main trial outcomes have been published previously.11,12 Briefly, participant inclusion criteria were: (1) radiographic evidence of knee OA, documentation of physician diagnosis of knee OA in the medical record, or self-report of physician diagnosis of knee OA along with items based on the American College of Rheumatology clinical criteria for knee OA,13 and (2) self-report of pain, aching, or stiffness in 1 or both knees on most days of the week. Exclusion criteria are shown in the Box. Participants were randomized to standard PT, IBET, or WT, in a 2:2:1 ratio, respectively. Participants in the PT group received up to 8 individual in-person treatment sessions within 4 months. Participants in the IBET arm received access to the online program for the full 12-month intervention period. Participants in the WT group did not receive PT or IBET during the study but were offered 2 PT visits and access to IBET following the 12-month assessments. For the full study sample at the 12-month follow-up (ie, the timepoint of interest for this study), IBET was noninferior to PT but neither PT nor IBET were superior to WT for the primary outcome, the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC).11 This study was approved by the institutional review boards of the University of North Carolina at Chapel Hill (UNC #14-1331) and Duke University Medical Center (#00055318). All participants provided written informed consent prior to participation. Recruitment for the trial occurred from November 2014 to February 2016, and follow-up assessments were completed in February 2017.
Final decision for optimal treatment regimens.
Exclusion Criteria
No regular internet access
Currently meeting Department of Health and Human Services Guidelines for physical activity
Currently completing series of physical therapy visits for knee OA
Diagnosis of gout in the knee, rheumatoid arthritis, fibromyalgia, or other systemic rheumatic disease
Severe dementia or other memory loss condition
Active diagnosis of psychosis or current uncontrolled substance abuse disorder
On waiting list for arthroplasty
Hospitalization for a stroke, heart attack, heart failure, or had surgery for blocked arteries in the past 3 months
Total joint replacement knee surgery, other knee surgery, meniscus tear, or ACL tear in the past 6 months
Severely impaired hearing or speech
Unable to speak English
Serious or terminal illness as indicated by referral to hospice or palliative care
Other health problem that would prohibit participation in the study
Nursing home residence
Current participation in another OA intervention
Fall history deemed by a study physical therapist co-investigator to impose risk for potential injury with participation in a homebased exercise program study
ACL: anterior cruciate ligament; OA: osteoarthritis.
PATH-IN data. PATH-IN participants completed outcome assessments at baseline, 4-month follow-up, and 12-month follow-up; these analyses focus on 12-month follow-up. The primary outcome was the WOMAC, which is a well-established self-reported measure of pain (5 items), stiffness (2 items), and function (17 items).14 All items are measured on a 5-point Likert scale, with a total scale range of 0 to 96; higher scores indicate worse symptoms and function. For this analysis, we included participants in all 3 study arms. There were 47 covariates measured at baseline, including demographic, clinical, OA-related, and physical activity–related variables (see Supplementary Tables S1 and S2, available with the online version of this article).
Out of 350 participants, 47 had missing values for WOMAC total score at the 12-month follow-up. We removed these participants, resulting in 303 participants for our analysis. Since the proportion of missingness was uniformly < 15% in baseline covariates, we imputed these values using MissForest, implemented in the missForest R package.15 The advantage of MissForest is that it can simultaneously handle categorical variables and continuous variables of unequal scales, which aligns with the PATH-IN data, and it is suited for datasets with high dimensional and potentially nonlinear interactions.
Machine learning approach.
• Overview and description of the value function. These exploratory analyses aimed to discover patient features related to differential improvement in the 3 study arms at 12-month follow-up. We first used established machine learning methods including RF16 and list-based methods.4 The list-based method produces an interpretable decision list of treatments by leveraging nonparametric techniques. However, none of these returned a rule that would improve outcomes for subgroups of patients. Therefore, we developed and applied a new tree-based approach which, in contrast to the above methods, determines each subgroup based on the average outcome, which is called the value function (VF).
The VF (V) of a treatment rule may be thought of as the expectation of an outcome (eg, total WOMAC score) if future patients followed that treatment rule. However, there were participants in the dataset who did not follow the treatment rule of interest (eg, patients who were randomly assigned to PT but, based on the decision rule, would be assigned to IBET). To address this, we used a technique called inverse probability weighting.17 This technique takes into account only the outcomes of patients who followed the treatment rule of interest and then amplifies those outcomes to represent the study sample by dividing them by the propensity of receiving the treatment. For instance, to obtain the VF for the treatment rule that assigns PT to all patients, we considered only the outcomes of patients who received PT and then divided each outcome by those participants’ propensity scores of receiving PT, which is a constant in the context of an RCT.
Higher VFs indicate greater quality of the decision rule and, hence, greater effectiveness of the interventions if patients were assigned to treatments based on that decision rule. Higher WOMAC scores indicate poorer outcome (eg, worse pain, stiffness, and function). Therefore, in order to match a higher VF (eg, greater quality of the decision rule) with a better outcome, for interpretability, we transformed WOMAC scores by multiplying them by −1 and adding 96 (the maximum score), resulting in an average of 69.7. We chose a jackknife estimator for estimating the VF, as recommended in Jiang et al.9 This approach is equivalent to leave-one-out cross-validation, and it is approximately unbiased (ie, consistent). Using this estimator, we can compare how well each machine learning method performs and determine statistical significance for the differences between the decision rules. As in Jiang et al, the VF of a zero-order model (ZOM; ie, the regimen where the estimated best single treatment is given to everyone) was used as a reference for comparing how well estimated treatment rules performed.9 That is, the VFs for each of the 3 ZOMs (IBET, PT, and WT) were estimated, and the IBET ZOM, which produced the largest VF estimate of the 3 VF estimates, served as the reference for comparison with each candidate treatment regimen. Z tests were applied to evaluate whether the method returned a statistically significantly better treatment regimen than simply assigning this estimated single best treatment to all patients.
• RF-informed tree-based learning. Out of the 47 variables identified as potential covariates (Supplementary Tables S1 and S2, available with the online version of this article), for computational feasibility we chose 13 for the analysis. This selection was based on the criteria of variable importance from RF prior to running the algorithm; these 13 covariates were selected at least once in the cross-validation approach to be the most important covariates. Details of this strategy are provided in Section 2 of the Supplementary Material (available with the online version of this article). The algorithm then begins by dividing the dataset into 2 subgroups, followed by iteratively splitting these subgroups into finer subgroups. In each iteration, the VF is calculated, and the algorithm determines whether the partition at that iteration is beneficial by meaningfully increasing the VF. The iterations continue until the splitting does not statistically significantly improve the VF. We performed a leave-one-out cross-validation by applying RF-informed tree-based learning to (n–1) participants and estimating the treatment rule for the participant left out to confirm robustness, then calculated the VF with estimated treatment rules. More detailed explanations for the variable selection approach and the algorithm are included in the Supplementary Material, and the R code for the analysis is available at https://github.com/siyeonstat/rfitbl.git.
RESULTS
Participants in these analyses had a mean age of 65.1 (SD 10.8) years and 73.2% were female. With respect to race, 73.2% were White, and 26.7% were from other racial groups (primarily Black or African American). Table 1 shows additional patient characteristics included in the statistical models.
Patient characteristics included in analyses.
As shown in Table 2, RF-informed tree-based learning (the new method) and RF were the 2 methods that yielded significantly higher VFs than the best ZOM’s VF. The P value for RF-informed tree-based learning was 0.01 and the P value for RF was 0.90. Since the WOMAC outcome was transformed by multiplying it by −1, the greater VF for RF-informed tree-based learning indicates that subgroups of patients would achieve greater improvement from the assigned treatments estimated by the new method than if all received IBET (the best overall single treatment based on 12-month WOMAC). Specifically, the WOMAC score would be expected to be 4.4 points better (75.5 vs 71.1) when the rule established by RF-informed tree-based learning is used vs a scenario in which all participants received IBET. This difference is 14.4% of the mean baseline total WOMAC score for this sample (Table 2), which exceeds a previously identified minimal clinically important difference of 12% in the context of these types of interventions.18
Value function estimates for outcome at 12-month visit.
The Figure displays the final rule determined for the total WOMAC score at the 12-month follow-up. Although the rule has 5 split points, the fourth and fifth split points (ie, those in the bottom row of the figure) have been combined for improved interpretability since thresholds for both nodes are defined using BMI (calculated as weight in kilograms divided by height in meters squared). Notable features of this decision rule include: (1) IBET was the optimal treatment for greater than half of patients overall (n = 174); (2) for a subgroup of younger individuals (age ≤ 49.3 years), IBET was the optimal treatment; (3) the subgroup for whom PT was the optimal treatment was characterized by age > 49.3 years, high Brief Fear of Movement score (> 9), and BMI between 26.3 and 37.2; and (4) for n = 17 patients, WT was the optimal treatment.
Final rule for the dataset with the outcome at 12-month follow-up visit. BMI calculated as weight in kilograms divided by height in meters squared. BFOM: Brief Fear of Movement; IBET: internet-based exercise training; PT: physical therapy; WT: wait list.
DISCUSSION
In this study, we applied a novel machine learning algorithm, RF-informed tree-based learning, to discover optimal treatment regimens for subgroups of patients in a trial of 2 exercise-based interventions vs a WT control for knee OA. The method addresses limitations of established machine learning methods (RF and list-based methods with kernel ridge regression and RF), which did not produce regimens that were significantly better than assigning all patients to the single best treatment in this study. The new algorithm successfully identified distinct subgroups for whom PT, IBET, or WT was the best treatment at the 12-month visit with regard to WOMAC total score. Specifically, assignment of the optimal treatment regimen resulted in a significant improvement over the ZOM, and this difference exceeded the threshold for a minimal clinically important difference in rehabilitation-type interventions for knee OA. This suggests that the proposed treatment regimens would deliver more beneficial results across patients than assigning a single best treatment to all individuals. Hence, tailoring referrals to specific exercise-based interventions, based on the patient characteristics identified, could result in greater impacts on OA symptoms. These findings are particularly interesting in the context of the overall findings of the trial, which showed that mean improvements in WOMAC were similar across the 3 study arms, including the WT, at 12 months. This further suggests that exercise-based interventions may be most effective when they are selected based on patient characteristics.
Notably, the subgroups identified by the algorithm were characterized by differences in age, BMI, and fear of movement, which are all feasible to evaluate in clinical settings. IBET was the optimal treatment for 57.4% of patients in these analyses. This is of interest, as it suggests that this lower resource intervention (relative to PT) may be more favorable for approximately half of patients with OA, when considering 12-month WOMAC total scores. Participants younger than 49.3 years and those at least 49.3 years with low fear of movement were subgroups for whom IBET was the optimal treatment regimen. Clinically, this suggests that patients with these characteristics may be better able to sustain behaviors and impacts of a self-guided exercise program. There was 1 relatively large subgroup (n = 112) for whom PT was the optimal treatment; this group was characterized by age > 49.3 years, high fear of movement, and BMI ranging between 26.3 and 37.2. The next largest group (n = 77) assigns IBET to patients aged ≤ 49.3 years, high fear of movement, and BMI < 26.4. However, for other subgroups, results are more challenging to interpret clinically because they involve combinations of variables and their identified thresholds. Finally, for a relatively small number of participants (n = 17), the WT condition was the optimal treatment regimen. This indicates that for the majority of individuals in the trial, 1 of the 2 active treatments (IBET or PT) was superior to no treatment.
Although this algorithm addresses some shortcomings of other machine learning methods, it also has some limitations. Since the algorithm exhaustively searches for 1 split point out of all the distinct points from every important variable in the list until a third variable is chosen, it can be computationally burdensome. Moreover, it is not guaranteed that the VF estimate from the final rule is the maximum of all possible VF estimates. The reason for this is that once a subgroup in a particular iteration has been decided, the algorithm in the next iteration searches for the subsequent finer subgroup only in the subgroup identified in the previous iteration. Although this process does not necessarily lead to the maximum VF estimate, it is designed to obtain a decision rule that produces a VF estimate as statistically significant as possible while also providing mechanistic parsimonious and interpretable rules. For future studies, we suggest developing a tool for discovering the maximum VF estimate with its corresponding decision rule for factors that identify distinct subgroups. Also, although we employed leave-one-out cross-validation to avoid the potential risk of overfitting, the absence of validating the estimated treatment rules with an external dataset posits a challenge in generalizing the interpretation of the study results. There are some additional limitations to this study. In this pragmatic trial, we did not obtain new radiographs or independent physician assessments to confirm knee OA diagnosis. However, all participants had either a prior radiographic or physician diagnosis, so it is very unlikely there were participants without either radiographic or symptomatic OA. This study was conducted in 1 geographic region of the US and only included participants with regular internet access, which may limit generalizability of findings. Finally, we focused on the total WOMAC score, as it was the primary outcome of the trial; it would be useful for future research to explore additional outcomes including separate pain and function measures.
In summary, these secondary analyses from the PATH-IN trial successfully identified meaningful subgroups of patients for whom PT, IBET, or WT was the respective optimal treatment. Because these results are exploratory, further studies are needed to evaluate whether these patterns are also observed in other cohorts and contexts. However, we believe these results offer some practical guidance for patients with knee OA, as well as for clinicians who refer these patients to exercise-based interventions. First, results suggest that younger patients (≤ 49.3 years) and those who are older but have low fear of movement may be able to sustain benefits (over a 12-month period) from a supported home-based exercise intervention. Second, patients > 49.3 years who have greater fear of movement may be good candidates for a referral to PT and may particularly benefit from this higher level of support and guidance, with respect to sustaining improvements after 1 year.
ACKNOWLEDGMENT
The study team thanks the study participants, without whom this work would not be possible.
Footnotes
This study was funded through a Patient-Centered Outcomes Research Institute Award (CER-1306-02043). The statements and opinions presented in this manuscript are solely the responsibility of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute, its Board of Governors, or Methodology Committee. This study was also funded through pilot funds from the University of North Carolina Thurston Arthritis Research Center and the North Carolina Translational and Clinical Sciences Institute (550KR221901). KDA, LA, LFC, YMG, AEN, and TAS receive support from National Institute of Arthritis and Musculoskeletal and Skin Diseases Multidisciplinary Core Center for Clinical Research Center (P30 AR072580). KDA receives support from the Department of Veterans Affairs, Health Services Research and Development Service (CIN 13-410, RCS 19-332).
The authors declare no conflicts of interest relevant to this article.
- Accepted for publication July 13, 2023.
- Copyright © 2023 by the Journal of Rheumatology







