Abstract
Objective. Expression of osteoarthritis (OA) varies significantly between individuals, and over time, suggesting the existence of different phenotypes, possibly with specific etiology and targets for treatment. Our objective was to identify phenotypes of progression of radiographic knee OA using separate quantitative features.
Methods. Separate radiographic features of OA were measured by Knee Images Digital Analysis (KIDA) in individuals with early knee OA (the CHECK cohort: Cohort Hip & Cohort Knee), at baseline and at 2-year and 5-year followup. Hierarchical clustering was performed to identify phenotypes of radiographic knee OA progression. The phenotypes identified were compared for changes in joint space width (JSW), varus angle, osteophyte area, eminence height, bone density, for Kellgren-Lawrence (K-L) grade, and for clinical characteristics. Logistic regression analysis evaluated whether baseline radiographic features and demographic/clinical characteristics were associated with each of the specific phenotypes.
Results. The 5 clusters identified were interpreted as “Severe” or “No,” “Early” or “Late” progression of the radiographic features, or specific involvement of “Bone density.” Medial JSW, varus angle, osteophyte area, eminence height, and bone density at baseline were associated with the Severe and Bone density phenotypes. Lesser eminence height and bone density were associated with Early and Late progression. Larger varus angle and smaller osteophyte area were associated with No progression.
Conclusion. Five phenotypes of radiographic progression of early knee OA were identified using separate quantitative features, which were associated with baseline radiographic features. Such phenotypes might require specific treatment and represent relevant subgroups for clinical trials.
Osteoarthritis (OA) is a degenerative joint disease characterized by pain and functional disability. In particular, knee OA has a high and increasing prevalence and is considered a major health and economic problem1. Structural changes affect the whole joint and include cartilage, bone, and soft tissues2. Definition of the disease and of diagnostic criteria remains difficult despite research on OA over many years3,4. This is mainly due to the generally slow progression of the degenerative process early in the disease5, and the (apparent) inconsistent relation between clinical symptoms and radiographic characteristics of OA representing structural damage (which are directly or indirectly assessed)6,7,8,9. Consequently, magnetic resonance imaging (MRI) techniques were developed, which show promise in directly visualizing morphologic and premorphologic changes of cartilage and other joint tissues using both conventional and complex MRI techniques10. However, radiography is still the primary method to prove the disease-modifying efficacy (tissue structure modification) of treatment2,11, because image acquisition is noninvasive, inexpensive, fast, and generally available1,12.
In clinical practice, expression of disease varies significantly between patients and over time, and therefore it is appreciated that different phenotypes (subpopulations) of OA exist2,11,13. For instance, in patients with prominent inflammation, a more destructive type of OA is found14. It is hypothesized also that radiographic phenotypes of OA exist. For example, some patients may suffer mainly from bone changes, while others predominantly have damage of cartilage. These radiographic phenotypes may also have their specific clinical characteristics. For example, patients with predominantly bone changes may sense more pain15, and patients with osteophyte growth may have more joint inflammation, as these phenomena have been linked16. The rate of progression and the sequence of occurrence of different radiographic characteristics may vary for different phenotypes of radiographic knee OA. Such subtle differences will be overlooked when progression of radiographic joint damage is evaluated by the commonly used Kellgren-Lawrence (K-L) grading, which is a rough score (0 to IV) summarizing multiple characteristics17. These limitations hamper selection of subgroups of individuals for whom specific treatment strategies might be helpful. In cases of bone involvement, bisphosphonates might be effective, but this benefit will be leveled out and will not be detected in the average OA population. Intensive treatment of inflammation might do more harm than good in the overall OA population18, but might be very helpful for subgroups of patients with evident inflammation. Identification of radiographic phenotypes is expected to improve through quantitative evaluation of separate features on radiographs.
The objectives of our study were to identify radiographic phenotypes of early knee OA and to describe their radiological and clinical characteristics.
MATERIALS AND METHODS
Study population; Cohort Hip & Cohort Knee (CHECK)
Development of knee OA was evaluated from baseline to 5-year followup in CHECK (Cohort Hip & Cohort Knee). In this cohort, 1002 participants with pain and/or stiffness of hip and/or knee, age 45–65 years, and without a previous visit or with a first visit not more than 6 months previously to the general practitioner for these complaints were included19. At baseline, 82% of the participants had knee complaints (18% had hip complaints only), and the radiographic knee damage of the entire cohort was limited, with K-L grade in the knee of 0 in 81%, I in 16%, II in 3%, and III in 0.4%.
The study procedures were in accord with the standards of the medical ethics committees of all 10 participating hospitals and with the Helsinki Declaration of 1975 (revised 2000), and all participants gave their written informed consent.
Knee images digital analysis
Standardized weight-bearing semiflexed views [metatarsophalangeal (MTP), according to Buckland-Wright, et al20,21] of both knees were acquired at baseline and 2-year and 5-year followup (T0, T2y, and T5y). Radiographs were analyzed for 14 separate OA variables by use of Knee Images Digital Analysis (KIDA)22: minimum joint space width (JSW, in mm), mean medial and mean lateral JSW, femur-tibia varus angle (in degrees), eminence height (to represent spiking of the tibial eminence; in mm), osteophyte area (in mm2) in lateral and medial femur and lateral and medial tibia, and bone density in these 4 compartments. The varus angle between the femur and tibia was determined in the frontal plane using the intersection points that determine the bone and cartilage interface; a positive value represents (more) varus alignment. Bone density was expressed in mmAl equivalents by comparison with an aluminum reference wedge that was added in each radiograph, which was found to be a reliable method to measure bone density23. Intra- and interobserver variation of KIDA was described22. In the CHECK study19 the KIDA measurements were performed by 1 experienced observer (M. Lafeber) in random order blinded to information on timepoint, severity, and characteristics of individual patients. The numbers of analyzed knees are indicated and vary slightly for the different radiographic variables, because poor radiographic quality can hamper KIDA measurement. The intraobserver variation, tested by random reanalysis of 108 radiographs several months later, revealed good intra-observer variability (intraclass correlation coefficient 0.73–0.99) for the different features24.
Statistical analysis
Using principal component analysis, the measurements of 14 separate KIDA variables were reduced into 5 components to represent the following radiographic features: (1) medial JSW (mean of 4 predefined locations); (2) lateral JSW (mean of 4 predefined locations); (3) osteophyte area (sum of lateral and medial femur and lateral and medial tibia); (4) eminence height (sum of lateral and medial); and (5) bone density (mean of lateral and medial femur and tibia)25,26. By multiplying the factor loadings from the principal component analysis of the individual KIDA measurements, 5 component scores were calculated. These 5 component scores (standardized using z-scores) were used in a hierarchical cluster analysis (Ward’s method) to identify possible phenotypes of progression of radiographic knee OA. Per individual, the component scores of the left and right knee at T0, T2y, T5y and the change scores (T5y – T2y and T2y – T0) were all used as separate variables in this analysis. The number of selected clusters of individuals was based on inspection of dendrograms (MBK, PMJW).
To interpret the clusters (phenotypes), the following straightforward and/or well known prespecified features were evaluated over time and compared between clusters: minimum JSW, medial JSW, lateral JSW, varus angle, osteophyte area (log-transformed sum of 4 compartments + 1; because normal distribution is preferred for statistical analysis), eminence height (sum of both), and bone density (mean of 4 compartments)25. Further, the presence of knee and/or hip pain as assessed by the physician during physical examination per joint, and the Western Ontario and McMaster Universities Osteoarthritis index pain and function scores (WOMAC; 0–100 scale, 100 = worst condition), assessed as an overall measure for the individual, were compared between clusters.
Subsequently, logistic regression analyses were performed to evaluate whether the radiographic features measured at T0, in addition to demographic and clinical characteristics [age, sex, body mass index (BMI), erythrocyte sedimentation rate, and pain intensity measured by visual analog scale (VAS; 0–10 scale, 10 = worst possible pain)] at T0, could be used to predict to which specific phenotype an individual belonged. These analyses were performed in participants that were included in CHECK with at least knee complaints at T0, because these individuals visit a physician with early complaints and are suspected for development of radiographic knee OA. Univariate and multivariate analyses were performed. In the multivariate analyses, all variables were initially included, and were removed manually using a backward selection strategy to generate a model including only variables that are significantly related (based on p value < 0.05 and size of the OR) to the outcome. Models including only demographic and clinical variables were compared to models where radiographic features were added and to models where conventional K-L grading was added. To represent the total burden of radiographic damage, for each participant the sum of the left and right knee was used in the models. Since radiographic features might be very characteristic of an individual specifically early in the disease24, the difference between a knee and the contralateral knee for the radiographic features was also studied as an independent variable. These difference scores might detect small changes by using the contralateral knee as a reference in this early OA population with only subtle damage in 1 joint.
Prognostic ability of the final models was summarized and compared using the area under the curve (AUC) of the receiver-operating characteristic (ROC) curve. The AUC-ROC provides a measure for the ability to discriminate between a specific phenotype and the other phenotypes27. Additionally, per phenotype the regression coefficients of the final models were corrected for overfitting using the van Houwelingen and Le Cessie method28, and were converted into a simple score. Three cutoff points were determined: optimal sensitivity, optimal tradeoff between sensitivity and specificity, and optimal specificity. For these cutoff values, positive predictive values (PPV) were calculated as an estimate of predictive ability.
Analyses were performed using SPSS version 15.0 and SAS version 9.1.3; p value < 0.05 was considered statistically significant.
RESULTS
Identification of radiographic phenotypes
Based on development over time of component scores of the knees, 5 clusters could be identified. Participants were classified only when complete data of KIDA measurements were available for all 3 timepoints, leading to evaluation of 417 of 1002 participants. The 5 clusters were interpreted as follows: (1) “Severe”: severe progression; (2) “Bone density”: prominent involvement of the bone density feature; (3) “Early”: progression mainly in an early phase (T0 to T2y); (4) “Late”: progression mainly in a later phase (T2y to T5y); and (5) “No”: no progression.
Figure 1 depicts the development of the separate radiographic features over time per cluster, representing the level of progression, the prominent involvement of bone density, and the different phases of progression, as follows.
In general, the radiographic features showed OA progression during 5 years’ followup: minimum and medial JSW decreased, and lateral JSW, varus angle, osteophyte area, eminence height, and bone density increased. Participants in the Severe cluster (n = 17; 4% of 417 available participants; Figure 1) progressed more evidently than participants in the other clusters on all radiographic features. Notably, at T0 features of these participants were already more affected than those of participants in the other clusters. The Bone density cluster (n = 113; 27% of participants) represented prominent involvement of bone density at all 3 timepoints compared with the other phenotypes. In this cluster the other features were only mildly affected. Participants in the Early cluster (n = 110; 26% of participants) progressed mainly between T0 and T2y, most evidently for lateral JSW, varus angle, and bone density. Participants in the Late cluster (n = 69; 17% of participants) progressed mainly between T2y and T5y on lateral JSW, varus angle, and eminence height. In the No progression cluster (n = 108; 26% of participants) the radiographic features did not progress during followup; small changes in radiographic features might be due to random error.
Characterization of radiographic phenotypes
Baseline characteristics are depicted per phenotype in Table 1.
Radiographic and clinical development
For further interpretation of the phenotypes, Table 2 depicts K-L grades at T0, T2y, and T5y of both knees of participants within the 5 clusters. The frequency of K-L grades was statistically significantly different between the clusters (chi-square test, p < 0.0001 at all timepoints). Notably, in the Severe cluster a substantial percentage of knees already had evident radiographic OA (K-L grade ≥ II) at T0, which increased over time, to 27% (n = 9 of 34 knees; 17 participants) at T0; 35% (n = 12 knees) at T2y; and 39% (n = 13 knees) at T5y. In the Bone density cluster the portion of knees with K-L grade ≥ II was substantial, specifically at T5y. In the Early and Late cluster the percentage of knees with K-L grade = 0 at T0 was highest (compared to other clusters). In the No cluster only 2%, 4%, and 6% of knees (n = 4, 9, and 13 knees) had radiographic damage (K-L grade ≥ II) at T0, T2y, and T5y, respectively.
Table 2 also depicts whether pain was present in the knee and/or the hip at T0, T2y, and T5y (bottom panel). The location of pain was significantly different between the phenotypes (chi-square test: p = 0.002 at T0, p = 0.001 at T2y, and p < 0.0001 at T5y). Participants with Severe radiographic progression specifically presented with knee pain. Participants in the Late cluster reported pain in “hip only” more commonly than participants in the other clusters, which might suggest (early) hip involvement, followed by knee involvement. Interestingly, a substantial portion of the participants reported “neither knee nor hip” pain at T2y and T5y, specifically in the No progression cluster. This may indicate that this phenotype is characterized by acute transient joint pain that does not lead to progressive radiographic joint damage.
Figure 2 depicts the development of the average WOMAC pain and function score over time per progression phenotype. The WOMAC scores were moderate at all timepoints and did not increase notably during followup. Although the average WOMAC pain score over time was statistically significantly lower in the No progression phenotype than in the Severe (p = 0.003), Bone density (p = 0.02), and Late (p = 0.02) progression phenotypes, the development over time was not significantly different between the phenotypes (tested using longitudinal regression analysis including an interaction term for time × phenotype). Also, the average WOMAC function score was significantly lower in the No progression phenotype than in the Severe (p = 0.004), Bone density (p = 0.03), Early (p = 0.04), and Late (p = 0.01) phenotypes. Further, progression over time was significantly different between the No and Late phenotypes.
Membership in a phenotype
Which baseline variables are associated with belonging to a specific phenotype (compared to belonging to any of the other phenotypes) was evaluated in 336 participants with knee pain at T0, because these are the individuals suspected of developing radiographic knee OA. Table 3 presents a summary of these logistic regression analyses by depicting per phenotype the AUC of the multivariate model with radiographic features and demographic and clinical variables, and the direction of the effect for the significant dependent variables (+: OR > 1 and −: OR < 1). Details of the regression analyses are given in the Appendix.
Because the Severe phenotype consisted of only 16 participants with knee pain, multivariate analyses were not performed for this outcome. In the univariate evaluation, almost all radiographic features were significantly associated with the outcome (severe phenotype vs other phenotypes), as were K-L grade and BMI.
In general, the multivariate analyses showed that the discriminative ability (AUC-ROC) of the models improved when radiographic features were added to the demographic and clinical variables. The K-L grade was not significantly associated with any of the phenotypes. The predictors for Early, Late, and No progression phenotypes generally had an effect opposite to that of the predictors for the Severe and Bone density phenotypes.
Female sex reduced and higher BMI increased the risk of belonging to the Bone density phenotype together with multiple radiographic features (Table 3), resulting in a model with AUC-ROC = 0.91 (95% CI 0.88–0.94), and with AUC-ROC = 0.87 (95% CI 0.83–0.91) after correction for overfitting and rounding of coefficients. The PPV, the chance of belonging to the Bone density phenotype, was 83% in individuals with a score above the cutoff for optimal sensitivity (Table 4). The final predictive score was calculated as
Female sex and several radiographic features were associated with the Late progression phenotype; AUC-ROC was 0.76 (95% CI 0.69–0.83) and remained unchanged (predictive score −0.2*lateral JSW −0.1*eminence height −0.05*bone density).
Women and participants with lower BMI were more likely to belong to the No progression phenotype, and several radiographic features were also associated with this phenotype. Unexpectedly, individuals with a larger varus angle were more likely to belong to the No radiographic progression phenotype. The discriminative ability of the model was fair, with AUC-ROC = 0.72 (95% CI 0.66–0.78) decreasing to 0.68 (95% CI 0.62–0.74) for the predictive score (−0.1*varus angle −0.5*(log[osteophyte area + 1]) + 0.2*absolute difference in eminence height + 0.05*age + 1*gender −0.1*BMI).
Table 4 shows PPV for the different cutoffs for the predictive scores per phenotype.
DISCUSSION
Our study describes a first attempt to identify specific phenotypes of progression of radiographic knee OA, specifically in participants with complaints of early OA. Phenotypes were found to represent the level of disease progression (Severe and No progression), the phase of progression (Early and Late), and the prominent involvement of Bone density. Although the definition of the phenotypes should be validated in other datasets, these phenotypes might represent a (partly) different etiology. Such phenotypes may benefit from different treatment strategies, e.g., an intense regimen that combines pain medication with cartilage-safe nonsteroidal antiinflammatory drugs in cases of the Severe phenotype, and treatment aimed at bone quality (e.g., bisphosphonates) in cases of the Bone density phenotype. The percentage of participants with K-L grade ≥ II was significantly higher in the Severe cluster than in the other clusters. Clinical characteristics were not evidently different between the clusters, and the WOMAC scores were only slightly lower in the No cluster than in the other clusters. This is in agreement with the limited relation between clinical and radiographic OA in earlier studies6,8,29.
Previous work investigating possible subtypes of radiographic joint damage was performed in more established OA13,30,31. We found that it is of high value to evaluate phenotypes in an early phase of the disease, because this might enable early intervention before structural damage is established. That we were able to identify specific phenotypes with different progression of radiographic features of OA using detailed KIDA measurement justifies continuing development of more precise evaluation of plain radiographs32 in the early phase of OA. For example, the finding by Oka, et al that varus alignment is a predictor for progression of OA32 emphasizes that this radiographic feature should be measured separately. Adding specific separate radiographic features to demographic and clinical characteristics also substantially improved ability to discriminate between the progression phenotypes, contrary to K-L grading of overall damage. Applying measurements of specific separate radiographic features in clinical trials is therefore recommended, and this will also enable our results to be confirmed and extended.
Female sex34 and BMI35,36 are known risk factors for onset and progression of OA, and were also identified as predictors for most (but not all) phenotypes of radiographic progression in this study. Interestingly, being female was protective of belonging to the Bone density phenotype, and was significantly (OR 3.87) associated with belonging to the No progression phenotype. This might be related to the fact that women have lower bone density than men37. Osteophyte area was identified as the most important predictor for Severe progression and Bone density involvement, and was protective for the No progression phenotype, which might support the notion that osteophyte formation is assumed to occur early in the disease17. Unexpectedly, however, osteophyte area was not associated with belonging to the Early phenotype, and this requires further evaluation. The radiographic features that were identified to be associated with the Early and Late progression phenotypes (e.g., eminence height, bone density, varus angle, and JSW) actually had a protective effect, which also calls for further evaluation.
Generally, the PPV based on the predictive scores using demographic and clinical characteristics combined with specific radiographic features were not high enough for prediction at the individual level. However, defining subgroups for inclusion in clinical trials might be significantly improved (e.g., smaller groups needed; less time-consuming and more cost-efficient studies) based on these scores and hence enable the development of a more personalized treatment approach. For instance, 54% of our population could be classified as belonging to the Bone density phenotype with a certainty of 53% (PPV) when the predictive score was > 9.0. In our study, overall, 24% of participants (113 of 417) were classified as belonging to the Bone density phenotype, so by using the predictive score a substantially different (sub)population can potentially be identified that might react differently to treatment. Clearly, however, these phenotypes and prediction models should be validated before they are used in this way.
Cluster analysis is a technique to group individuals who are “similar” regarding the variables that are included in the analysis. To derive a set of phenotypes, “subjective” choices also have to be made. The value of clustering individuals is determined by the relevance and characteristics of the clusters, in our case underlying etiology, disease severity, need for treatment, and longterm outcome. Performing a cluster analysis with a different set of variables, for instance including clinical characteristics, might result in different clusters, e.g., phenotypes in which radiographic and clinical characteristics are strongly related to each other. Also, when such evaluations can be verified in an even larger population, this can limit overfitting of the model by evaluating a large number of variables in a relatively small population.
In our study, cluster analysis aimed at identifying radiographic progression phenotypes by exploring radiographic features at and between different timepoints. We also deliberately chose to cluster participants and not knees. When performing cluster analysis with radiographic features at T0, T2y, and T5y separately, a Severe cluster with involvement of all feature scores, a cluster with Bone density involvement, and a cluster with No progression of all feature scores were identified, which adds to the validity of the defined progression clusters. Of note, no clusters were identified with specific progression of, for example, 1 knee (and not the other knee). Radiographic features within an individual might explain this; they are quite similar, and small differences are overlooked because of much larger differences between individuals or knees24. Also, this finding might be a reflection of the systemic character of OA38 affecting the whole joint and also more joints within an individual39. This might also be the reason that the scores of differences of the radiographic features were not related to membership in a specific phenotype.
Limitations of our study are that the number of participants was evidently decreased by the requirement for complete data for both knees at all 3 timepoints. However, this was not considered to be systematic bias because the reason for missing data was only radiographic quality. Age, sex, pain, and K-L grading were comparable between the participants who were and those who were not included in our analyses. Importantly, we did not select radiographs that had perfect tibial alignment. Although this might have influenced outcome regarding, for example, JSW40, this approach most closely represents clinical trial practice.
Further, although it seems intuitive that the different radiographic features at baseline are associated with the phenotypes, this was not the case. It was found that radiographic features at baseline were associated with the development over 5-year followup, because radiographs at 3 timepoints were assessed to define the phenotypes. It was the detailed evaluation of the separate radiographic features that enabled identification of phenotypes, which could not have been done in this early phase of the disease by K-L grading (because only a small portion of participants had radiographic OA based on K-L grading).
Because our results represent a first attempt to define different phenotypes of OA based on radiographic features in early OA, results should be replicated and further validated. Future investigation might also include clinical OA characteristics and other measurements regarding structural joint damage, e.g., MRI10, to further define subgroups of OA.
Based on separate radiographic features, phenotypes with different levels and phases of progression and prominent involvement of “bone density” were detected in our cohort of participants with early complaints related to OA. These phenotypes might represent potential subgroups for the evaluation of preventive therapies in clinical trials and the discovery of better-targeted treatment strategies.
Acknowledgment
We acknowledge Marja Lafeber for KIDA analyses. The CHECK steering committee comprises 16 members with expertise in different fields of OA and is chaired by Prof. J.W.J. Bijlsma and coordinated by J. Wesseling. We acknowledge the participation of the following institutions: Academic Hospital Maastricht; Erasmus Medical Center Rotterdam; Jan van Breemen Institute – VU Medical Center Amsterdam; Kennemer Gasthuis Haarlem; Martini Hospital Groningen – Allied Health Care Center for Rheumatology and Rehabilitation Groningen; Medical Spectrum Twente Enschede – Twenteborg Hospital Almelo; St. Maartenskliniek Nijmegen; Leiden University Medical Center; and University Medical Center Utrecht and Wilhelmina Hospital Assen.
APPENDIX. Details of regression analyses with phenotype as dependent variable.
Footnotes
-
Supported by the Dutch Arthritis Association.
- Accepted for publication January 21, 2013.