Abstract
Objective. To develop and validate a radiographic scoring system for the assessment of radiographic damage in the hip joint in patients with juvenile idiopathic arthritis (JIA).
Methods. The Childhood Arthritis Radiographic Score of the Hip (CARSH) assesses and scores these radiographic abnormalities: joint space narrowing (JSN), erosion, growth abnormalities, subchondral cysts, malalignment, sclerosis of the acetabulum, and avascular necrosis of the femoral head. Score validation was accomplished by evaluating reliability and correlational, construct, and predictive validity in 148 JIA patients with hip disease who had a total of 381 hip radiographs available for study.
Results. JSN was the most frequently observed radiographic abnormality, followed by erosion and sclerosis of the acetabulum. The least common abnormalities were avascular necrosis, growth abnormalities, and malalignment. Interobserver and intraobserver reliability on baseline and longitudinal score values and on score changes was good, with intraclass correlation coefficients ranging from 0.76 to 0.98. Early score changes, but not absolute baseline score values, were moderately correlated (rs > 0.4) with clinical indicators of disease damage at last followup observation, thereby demonstrating that the CARSH has good construct and predictive validity. The amount of structural damage in the hip radiograph at last followup observation was predicted better by baseline to 1-year score change (rs = 0.66; p < 0.0001) than by absolute baseline score values (rs = 0.40; p = 0.002).
Conclusion. Our results show that the CARSH is reliable and valid for the assessment of radiographic hip damage and its progression in patients with JIA.
Juvenile idiopathic arthritis (JIA) is a chronic disease characterized by prolonged synovial inflammation that can lead to destruction of joints1. Because prevention or retardation of joint changes is a major objective of treatment of chronic arthritis, evaluation of radiographic joint damage has become an important tool for assessment of disease severity and progression in children with JIA. Although it is commonly thought that JIA has a lesser destructive potential than adult rheumatoid arthritis, several studies have shown that many children with chronic arthritis develop significant radiographic joint lesions2–7. Further, a higher than expected percentage of these patients have been found to have joint space narrowing (JSN) and erosions early in their illness3,7,8.
Standardized, quantitative assessment of radiographic joint damage in children with JIA has traditionally been hampered by the lack of established scoring methods for use in the pediatric age group. In recent years, there has been a great deal of effort to devise new radiographic scoring systems or validate existing methods for use in JIA3,5,6,9–14. However, most of these methods are based on assessing damage in hand and wrist joints. It is still unclear whether damage scored in these joints sufficiently reflects damage of large, often weight-bearing joints that are excluded from scoring15. Abnormalities in hand radiographs of children with JIA have not been found to correlate well with abnormalities in other joint groups5.
Hip disease develops in 30% to 50% of children with JIA and is seen frequently in the most severe, destructive forms16. Because of the importance of the hip joint in weight-bearing and physical function, the advent of hip disease warns of future disability. Hips, together with wrists, have been found to be most vulnerable to early JSN in JIA4. In a study of the natural history of hip involvement in children with chronic arthritis, destructive changes in the hips were evident in the majority of cases within 5 years17. In some patients, the course of hip arthritis may be very rapid and aggressive and ultimately require joint replacement surgery. Hip involvement has been found to be a marker for bad prognosis in systemic JIA18. Even though newer, more sensitive imaging modalities such as magnetic resonance imaging and ultrasound19–21 are gaining increasing popularity, the plain radiograph remains an important tool in the determination of the course and outcome of hip disease in JIA. Appropriate scoring methods are pivotal in order to quantify radiographic damage and its progression over time. However, no radiographic scoring system developed specifically for the hip joint is available.
In our study, we describe the development of a new radiographic scoring system for the hip, the Childhood Arthritis Radiographic Score of the Hip (CARSH), and provide preliminary evidence of its validity in children with JIA.
MATERIALS AND METHODS
Development of the hip radiographic score
The CARSH was devised by a panel of 5 pediatric rheumatologists with 5 to > 20 years of experience in the field and is based on their clinical experience, as well as on analysis of the literature on radiographic features of hip disease in JIA16,22–24, and on radiographic scoring systems in adult and childhood chronic arthritis25–27. After extensive discussion, the panel reached consensus on the individual forms of radiographic damage to be included in the score and on the grading system. The following radiographic abnormalities were included in the CARSH: JSN, erosion, growth abnormalities, subchondral cysts, malalignment, sclerosis of the acetabulum, and avascular necrosis of femoral head. A score of 0 is given if no abnormalities are found. Depending on their severity, JSN is scored from 1 to 3, erosion is scored from 1 to 4, and growth abnormalities, subchondral cysts and malalignment are scored as 1 or 2. No grading is assigned to sclerosis of the acetabulum, which is scored as 1, and to avascular necrosis of femoral head, which is scored, due to its greater severity, as 2. The total score in both hips together ranges from 0 to 32. The CARSH is shown in Table 1.
Patient selection
The radiology records of patients seen at the study units from January 1986 to December 2005 were reviewed to identify those who had a diagnosis of JIA according to the International League of Associations for Rheumatology revised criteria28 and hip involvement and who had at least 1 standard radiograph of both hips in posteroanterior view.
Reading strategy
Two observers, who were unaware of the clinical data of patients, independently assigned the CARSH to all study radiographs. Radiographs from each patient were read in sequential order, and previous radiographs and scores were available to observers when examining and scoring followup radiographs. Both observers are pediatric radiologists with > 15 years of experience in reading skeletal radiographs, but they were not familiar with radiographic scoring. Before the beginning of the study, the observers had a training session with the principal investigator, a pediatric rheumatologist with > 20 years of clinical experience and familiarity with radiographic scoring, in order to gain experience with the CARSH. Abnormal JSN was separated from normal by visual comparison of the 2 hips in cases of unilateral involvement. In cases of bilateral involvement, this assessment was based on the judgment of the pediatric radiologist, who was asked to base the evaluation on his or her experience in examining hip radiographs of children with JIA. Readers were instructed to rule out the possible influence of widening of hip joints related to effusion. Readers were also advised about the importance of distinguishing abnormal radiographic changes from normal changes related to joint growth over time in pediatric subjects. In case of doubts, readers were asked to confirm pathologic changes through comparison of longitudinal radiographs whenever available.
Interobserver reliability of the scoring method was assessed for all the radiographs read by the 2 observers. Intraobserver reliability was based on the scores of radiographs obtained from a subset of 37 randomly selected patients, whose radiographs were read a second time in a blinded manner by 1 of the 2 observers, 3 months after the previous review. Both interobserver and intraobserver reliability were assessed on radiographs read in sequential order. Because intraobserver reliability was quite satisfactory, we deemed it sufficient to perform intraobserver reliability assessment in only 1 of the readers.
Clinical assessment
Patient information included sex, age at disease onset, JIA subtype, and disease duration at the time of first and subsequent hip radiographs. At the time of each hip radiograph, the following information was recorded: functional ability assessment through the Italian version of the Childhood Health Assessment Questionnaire (CHAQ; 0 = best; 3 = worst)29, the Poznanski score of radiographic damage30, and the adapted Sharp-van der Heijde (SH) radiographic score31. The Poznanski score is a measure of carpo-metacarpal ratio and reflects the amount of cartilage loss in the wrist. The more negative the Poznanski score, the more severe the radiographic damage. For our study, a score below –2 was considered abnormal. The adapted SH score applies to 15 areas for JSN and 21 areas for erosion in each hand and wrist. Compared to the original score, the adapted version includes 5 additional areas for erosion in each wrist. Scores for JSN and erosion in each area range from 0 to 4 and from 0 to 5, respectively. The adapted SH total score ranges from 0 to 330. All wrist-hand radiographs used in our study had the Poznanski and adapted SH scores assigned in the context of previous analyses6,31,32.
The following clinical assessments made at the last followup visit were recorded: physician’s global assessment of overall disease activity measured on a 10-cm visual analog scale (VAS; 0 = no activity; 10 = maximum activity); parent’s global assessment of the child’s overall well-being on a 10-cm VAS (0 = very good; 10 = very poor); parent’s rating of the child’s intensity of pain on a 10-cm VAS (0 = no pain; 10 = maximum pain); count of joints with swelling, pain on motion/tenderness, restricted motion, and active disease33; functional ability assessment with the Italian version of the CHAQ; Steinbrocker functional class34; physician’s and parent’s assessment of the patient’s disability on a 6-point categorical scale (1 = no disability; 6 = severe disability)35; Juvenile Arthritis Damage Index, Articular score (JADI-A)36; Poznanski score; erythrocyte sedimentation rate (ESR; Westergren method); and C-reactive protein (CRP; nephelometry). The JADI-A assesses 36 joints or joint groups for the presence of damage and the damage observed in each joint is scored on a 3-point scale (0 = no damage; 1 = partial damage; 2 = severe damage, ankylosis, or prosthesis). The maximum total score is 72.
Statistical analysis
Validation procedures were primarily based on the analysis of reliability, correlational validity, construct validity, and predictive ability. Interobserver agreement and intraobserver agreement for the CARSH were analyzed by computing the intraclass correlation coefficient (ICC)37 for baseline values, longitudinal score values, and score changes between baseline and time 1. For the interpretation of ICC values, the following classification was used: < 0.4 = poor agreement; ≥ 0.4 to < 0.75 = moderate agreement; and ≥ 0.75 = good agreement38. The average score of the 2 readers was used in validation analyses.
Correlational validity of the CARSH was examined by computing the correlation between the CARSH values and the values of other measures of clinical and radiographic damage, including the CHAQ, the adapted SH score, and the Poznanski score, measured at all study time points. Construct validity is a form of validation that seeks to examine whether the construct in question, in this case the CARSH, is related to other measures in a manner consistent with a priori prediction. Predictive validity is established when the measure under study demonstrates the ability to predict disease outcome. Given that the CARSH is a measure of structural joint damage, it was predicted that the correlation of its baseline values and changes over time with the values at the last followup visit would be in the moderate to high range for the count of joints with restricted motion, the Steinbrocker functional class, the JADI-A, the physician’s and parent’s assessments of patient disability on a categorical scale, and the Poznanski score (which all measure closely related structures). The adapted SH score was not included in this analysis as it was available for only a few patients at the last followup visit. Correlations with disease activity measurements at the final visit were predicted to be poor in the physician’s global assessment of disease activity, the parent’s rating of patient well-being and intensity of the patient’s pain, the count of swollen and tender joints, the ESR, and the CRP. No predictions were made for the correlation with the CHAQ because this measure was found to reflect both disease activity and damage in all stages of JIA35,39. Construct validity of CARSH was also examined by computing the Spearman correlation of absolute CARSH values at baseline and changes CARSH values between baseline and time 1 with CARSH values in the hip radiograph obtained at last visit in patients who had at least 3 years of followup.
All correlations were assessed using Spearman’s rank correlation coefficient. For the purpose of this analysis, correlations > 0.7 were considered high, correlations ranging from 0.4 to 0.7 moderate, and correlations < 0.4 low40. Agreement between predicted and observed correlations was taken as evidence of construct validity. Association of radiographic damage at baseline and radiographic progression between baseline to time 1 and amount of longterm clinical and radiographic damage was considered as evidence of predictive validity. Statistical analysis was performed with Statistica (StatSoft, Tulsa, OK, USA).
RESULTS
A total of 148 patients eligible for our study were identified. The main demographic and clinical features of these patients are presented in Table 2. The median disease duration at baseline was 1.3 years [interquartile range (IQR) 0.4–4.1 yrs]. The total number of hip radiographs available for study (baseline plus followup radiographs) was 381. All patients had a radiograph available at baseline, 84 had a second radiograph available a median of 1.2 years (IQR 0.9–2.9 yrs) after baseline (time 1), 52 had a third radiograph available a median of 2.9 years (IQR 1.7–4.4 yrs) after baseline (time 2), 36 had a fourth radiograph available a median of 4.6 years (IQR 3.2–7.5 yrs) after baseline (time 3), and 27 had 1 or more radiographs (n = 61) available after time 3, up to 11 years after baseline. The patients represented around 20% of the total clinic population seen in the study period.
Table 3 shows the frequency of the radiographic abnormalities included in the CARSH at study timepoints. JSN was the most frequently observed form of radiographic damage at all timepoints, followed by erosion and sclerosis of the acetabulum. The least common abnormalities were avascular necrosis, growth abnormalities, and malalignment. Figures 1 and 2 illustrate the application of the CARSH in 2 patients with longstanding systemic JIA and hip disease. Only 2 patients were found to have improvement in CARSH scores over time.
Interobserver and intraobserver reliability
The interobserver agreement for the CARSH, as assessed by the ICC, was 0.98 for baseline scores, 0.76 for changes in scores from baseline to time 1, and 0.96 for scores obtained on the whole-study radiographs. The intraobserver agreement was 0.96 for baseline scores, 0.82 for changes in scores from baseline to time 1, and 0.97 for scores on the whole-study radiographs. Overall, the ICC values demonstrate a good reliability of the radiographic score.
Correlational validity
Spearman correlations between CARSH values and values of other indicators of clinical or radiographic damage, measured for all timepoints combined, were poor for the CHAQ (0.01), the adapted SH score (0.30), and the Poznanski score (–0.28).
Construct and predictive validity
This analysis was conducted by calculating the Spearman correlations of absolute CARSH values at baseline and changes in CARSH values between baseline and time 1 with the clinical measures of JIA activity and severity at the last followup visit, a median of 7.64 years (IQR 4.1–11.2 yrs) after baseline. This analysis involved 76 patients who had a hip radiograph available at time 1, a followup > 3 years after baseline, and sufficient followup clinical data available. The CARSH values for these patients were comparable with those for the 72 patients who could not be included because of a followup period of < 3 years or a lack of clinical information (data not shown).
Correlations yielded by baseline absolute CARSH values were all poor (data not shown). Conversely, most correlations obtained for the change in CARSH values were in line with expectations (Table 4). Radiographic score changes from baseline to time 1 were moderately correlated, as predicted, with the clinical indicators of disease damage observed at last followup, such as the restricted joint count, the Steinbrocker functional class, the JADI-A, and the physician’s and parent’s categorical assessments of patient disability. Also as predicted, radiographic damage was poorly correlated with disease activity measures such as the physician’s and parent’s global assessments, the parent’s pain rating, the swollen and tender joint counts, the count of joints with active disease, and the acute-phase reactants. Unexpectedly, correlations for the functional ability tool (CHAQ) and the score of radiographic damage in the wrist (Poznanski score) were in the poor range.
Correlations with CARSH values obtained on hip radiographs made at last followup observation, assessed in the 58 patients who were followed for more than 3 years, were greater for the change in CARSH values between baseline to time 1 (rs = 0.66; p < 0.0001) than for the baseline absolute CARSH values (rs = 0.40; p = 0.002).
All the correlations obtained for the complete CARSH score were greater than those obtained using a reduced version that incorporated only JSN and erosion (data not shown).
DISCUSSION
Several studies have examined the radiographic features of hip disease in children with jia5,7,14,16,22–24. However, most of them provided only a description of the different radiographic abnormalities or simply assessed structural lesions as present or absent. Because the spectrum of radiographic changes that may develop in the hips of children with chronic arthritis is wide, a distinctive radiographic scoring system is warranted for these joints. The radiographic scores used in adult patients with rheumatoid arthritis25,26 are not suitable for this purpose because they do not incorporate assessment of some forms of damage that are unique to children with JIA, namely disturbances in bone growth27.
The CARSH is a new measure of radiographic joint damage in the hip in patients with JIA. It covers the most common radiographic abnormalities that are seen in the hip joint in children with chronic arthritis and is simple, easy to use, and quick, taking only a few minutes to score. Interobserver and intraobserver agreement, as assessed by the ICC, were good for both absolute values and score changes, demonstrating that the CARSH is reliable. As expected, radiographic score changes from baseline to time 1 were moderately correlated with the clinical indicators of disease damage at the last followup visit and with the amount of longterm structural hip damage, and were poorly correlated with clinical measures of disease activity at the last followup visit in a cohort of patients with longstanding JIA. Correlation with clinical measures of longterm damage indicate that the CARSH predicts disease outcome. By documenting these key measurement properties, we have shown that the CARSH is a valid instrument for the assessment of hip radiographic damage in this patient population and is, therefore, potentially applicable in both clinical and research contexts.
The amount of radiographic damage in the hip joints, as measured with the CARSH, did not correlate cross-sectionally with radiographic damage in the hand/wrist joints, as measured with either the adapted SH or the Poznanski score. Further, change in CARSH score from baseline to time 1 did not predict longterm radiographic damage in the wrist, as measured with the Poznanski score. These findings suggest that damage scores in hand and wrist joints do not reliably reflect damage of large, weight-bearing joints, and that neither the hand and wrist joints nor the hip joints may serve as “index” joints in JIA.
Another important observation was that the disease outcome, as assessed by the conventional clinical indicators of damage, was predicted by the change in the hip radiographic score from baseline to time 1, but not by the level of radiographic damage at baseline. Similarly, the amount of longterm structural hip damage was better predicted by the change in the CARSH from baseline to time 1 than by the initial radiographic damage. Similar results were reported in studies on hand and wrist radiographic scores in children with JIA6,31. These findings suggest that the potential for progression that the disease displays over time may be more important in predicting longterm outcome than the clinical characteristics at baseline41.
Unexpectedly, CARSH values were not correlated with CHAQ scores on cross-sectional data; further, the CHAQ was the only clinical measure of longterm damage that CARSH change did not predict. A lack of correlation with CHAQ disability was found with the adapted SH score as well31. These findings are in keeping with observations indicating that the CHAQ is an imperfect measure of disease damage because it is affected by acute, reversible factors (i.e., functional limitations due to current disease activity)35,39.
Evaluation of the frequency of CARSH abnormalities over time showed that JSN, which reflects cartilage loss, was the most common form of damage throughout the disease course in children with JIA. The same phenomenon was noted with the application of the hand/wrist radiographic scores31,32. This is likely due to bone ends being composed principally of cartilage in younger children and to the greater ability of pediatric subjects, compared to adults, to regenerate articular cartilage23,27.
Our study had certain limitations. The radiographic scoring system was devised by a panel of pediatric rheumatologists, but radiographs were read by 2 pediatric radiologists. It remains to be established whether reading by pediatric rheumatologists would have yielded comparable results and whether and to what extent pediatric rheumatologists and pediatric radiologists would have agreed on scoring. The reading of serial radiographs may have facilitated concordance among readers, while agreement on scoring of cross-sectional radiographs might have been more difficult to achieve. Readers examined the radiographs in chronological sequence and were allowed to see previous scores. However, there is no definite consensus on whether readers should be aware of the time order of radiographs42. However, blinding radiographs for chronological order in children is impossible because of readily apparent growth and maturation of the skeleton. A control sample of hip radiographs from healthy children, which would have strengthened the study, was not available. Because growing joints change anatomically over time, the lack of normal standards may have affected the reliability of our results. Clinical data were collected retrospectively. A retrospective study is subject to missing and possibly erroneous data.
We have developed a new radiographic scoring system for the hip joint in children with JIA that proved reliable and valid and performed well in terms of identifying radiographic damage and its progression. Application of this measure in future studies of radiographic outcome may enhance reliability and comparability of findings and help to increase current understanding of the natural history of the disease.
Footnotes
- Accepted for publication September 22, 2009.