Abstract
Objective. To assess the intrareader and interreader agreement and sensitivity to change of the Outcome Measures in Rheumatology (OMERACT) Rheumatoid Arthritis Magnetic Resonance Imaging Joint Space Narrowing (RAMRIS-JSN) score in the rheumatoid arthritis (RA) wrist in a longitudinal multireader exercise.
Methods Coronal T1-weighted MR image sets of 1 wrist from 20 patients with early RA were assessed twice for JSN at 17 sites at baseline and after 36 or 60 months by 4 readers blinded to patient data but not time order. The joints were scored 0–4 according to the OMERACT RAMRIS-JSN score. Intraclass correlation coefficients (ICC), smallest detectable change (SDC), percentage exact/close agreement (PEA/PCA), and standardized response mean (SRM) were calculated.
Results. Median baseline and change score was 10.3 and 1.9, respectively. Intrareader ICC for baseline and change scores was good (≥ 0.50) to very good (≥ 0.80) for all and 3 of 4 readers, respectively. Interreader ICC was very good for change (0.93), while poor for baseline score if all 4 readers were included (0.36), but very good if 1 reader was excluded (0.87). Intrareader and interreader SDC was low (2.34–3.18), except for the intrareader SDC for 1 reader (6.75). The mean PEA/PCA was high for baseline and change scores both within and between the readers (51.5–99.2), except for interreader baseline PEA (14.4). SRM was moderate for all readers (0.55–0.77).
Conclusion. The OMERACT RAMRIS-JSN score showed high overall intrareader and interreader reliability, and moderate sensitivity to change, supporting inclusion of the measure as part of the OMERACT RAMRIS system.
Because extended placebo-comparator studies are no longer ethical, distinguishing the difference between therapies in clinical trials has become more difficult1. Methods with high sensitivity are needed to detect change in patients with rheumatoid arthritis (RA).
The Outcome Measures in Rheumatology (OMERACT) RA Magnetic Resonance Imaging (MRI) Score (RAMRIS) assesses inflammation (synovitis and bone marrow edema) and bone damage (bone erosion), but when RAMRIS was introduced, cartilage damage was excluded because of insufficient image quality2,3,4. Nevertheless, MRI sequence technology has improved, making visualization of cartilage and JSN feasible.
Studies have shown that conventional radiography assessment of JSN in addition to bone erosion can provide valuable information on disease progression and impairment of physical function and work ability5,6. Further, the suppression of bone erosion by a drug may not correspond to the suppression of JSN7. Hence, adding MRI assessment of JSN may broaden the spectrum of clinically relevant structural damage pathologies evaluated in RA.
In 2011, the OMERACT MRI in arthritis working group presented a semiquantitative JSN scoring system for the hand as a potential addendum to RAMRIS (RAMRIS-JSN). Primary results have shown high intrareader and interreader reliability and moderate to good correlation with conventional radiography. Further good correlation was seen between MRI and computed tomography8,9. In the present multireader longitudinal study, we aimed at assessing the intrareader and interreader agreement and the sensitivity to change of the OMERACT RAMRIS-JSN score.
MATERIALS AND METHODS
Patients
Twenty RA patients from the Oslo Early RA cohort (disease duration < 1 year), all fulfilling the 1987 American College of Rheumatology classification criteria, were included in the exercise. All patients gave their informed consent.
Image acquisition
MRI of the dominant wrist was performed on a 1.5 Tesla MR scanner (General Electric, Signa) using a dedicated high-resolution wrist phased array coil. Coronal T1-weighted precontrast fast spin echo MR images (2.5 mm slice thickness, field of view 100 × 100 mm, matrix 320 × 256) were acquired. To obtain a wide range of JSN, MR images at baseline (all patients) and either at 36 months (2 patients) or 60 months’ (18 patients) followup were selected by an assessor (SL) not participating in the reading exercise.
Scoring of images
The 20 paired image sets were read twice, on 2 consecutive days, on identical, dual-screen workstations (SECTRA PACS) by 4 experienced MRI readers familiar with the RAMRIS-JSN score (EH, IE, PC, MØ). A calibration session was performed using similar images the evening before the exercise. The readers were blinded for patient data but not for time order, as suggested by van Tuyl, et al10. The image sets were rerandomized and reanonymized before second reading. Images were scored at 17 sites in the wrist, according to the OMERACT RAMRIS-JSN score as described by Østergaard, et al8. Accordingly, JSN was defined as reduced joint space width compared to normal, as assessed in a slice perpendicular to the joint surface. Each site was given a score for narrowing between 0–4 as follows: 0, no narrowing; 1, focal or mild (< 33%); 2, moderate (34–66%); 3, moderate to severe (67–99%); 4, ankylosis.
Statistics
Status (baseline) scores and change (between baseline and followup) scores were calculated for each patient, in 2 independent readings by each reader. We used the mean of the 2 readings of the individual readers for calculations of interreader agreement and sensitivity to change.
Intrareader and interreader agreement was calculated using single measure and average measure intraclass correlation coefficients (ICC), respectively. An ICC ≥ 0.50 was considered good and an ICC ≥ 0.80 was considered very good. The smallest detectable change (SDC) was calculated for change scores11 and was also expressed as the percentage of the highest score observed by the reader (mean maximal score between the readers for interreader SDC). Agreement between the readers was also expressed as the percentage of exact agreement (PEA), where the 4 readers had the same score, and as the percentage of close agreement (PCA), where the difference between the minimum and maximum score of the 4 readers was ≤ 1. Agreement between the first and second score for a single reader was also expressed as PEA and PCA.
Sensitivity to change was estimated using the standardized response mean (SRM: trivial, < 0.20; small, 0.20–0.49; moderate, 0.50–0.79; good, ≥ 0.80), calculated by dividing the mean change score with the SD of the change.
RESULTS
Baseline characteristics of the patients are presented in Table 1.
Baseline characteristics of the 20 patients with rheumatoid arthritis included in the exercise.
At baseline, JSN (total score ≥ 1) was found in all patients by all readers. The total JSN score at baseline of the individual readers was median 10.3 (mean 11.3), range 8.0–16.8, whereas the change in score from baseline to followup was median 1.9 (mean 2.1), range 1.5–3.0.
Intrareader and interreader ICC for total scores and SDC for total and individual joint space scores are presented in Table 2. Intrareader ICC was good to very good for all readers for baseline scores, and for 3 of 4 readers for change scores. Interreader ICC for baseline total scores was poor (0.36) when all 4 readers were included, but very good (0.87) when 1 reader was excluded. The interreader ICC was very good (0.93) for change scores. The intrareader SDC and the percentage of maximal score observed for total scores were low (i.e., good) for 3 of the 4 readers. Interreader SDC and percentage of mean maximal score observed were also low for total score. Intrareader and interreader SDC for separate joints were below 1 at all sites, i.e., below the increment of the scoring system.
Intrareader and interreader intraclass correlation coefficients (ICC) and smallest detectable change (SDC).
The PEA and PCA for the individual joint spaces are presented in Table 3. Overall, the mean PEA on change score was 86.7% within readers and 51.5% between readers, while the corresponding mean PCA were 99.2% and 92.4%.
Percentage of exact agreement (PEA) and close agreement (PCA) for JSN assessment in each individual joint space.
Figure 1 displays the presence of baseline JSN (Figure 1A), change in JSN (Figure 1B), and mean JSN scores (Figure 1C and 1D), per individual joint space. The trapezium-scaphoid joint, trapezoid-capitate joint, and first carpometacarpal (CMC) joint showed change most frequently, while the CMC 5 and trapezium-trapezoid (TRM-TRD) joint showed the least change.
Prevalence and mean scores of JSN (status and change), per individual joint space. The prevalence of patients with a score > 0 at baseline (A) and for change scores (B) is shown at each individual joint site. Mean scores at baseline and followup (C) and for change scores (D) are means of the 8 reads (2 reads by each of the 4 readers). JSN: joint space narrowing; CMC: carpometacarpal; TRM: trapezium; TRD: trapezoid; CAP: capitate; HAM: hamate; SCA: scaphoid; LUN: lunate; TRI: triquetrum; RAD: radius.
The SRM for total change scores was moderate in all readers (reader 1: 0.77; reader 2: 0.76; reader 3: 0.55; reader 4: 0.65).
DISCUSSION
In this longitudinal multireader early RA exercise the OMERACT RAMRIS-JSN score allowed assessment of JSN with overall high intrareader and interreader agreement for status and change scores. Further, a moderate SRM was found, indicating that the score allows detection of change in JSN.
The intrareader ICC for baseline and change scores was good to very good in all readers and in 3 readers, respectively, suggesting that the scoring system is reproducible. The low baseline interreader ICC in contrast to the very good interreader ICC for change score indicates that the readers were able to find similar change despite differences between the readers for status scores. It should be emphasized that high agreement for change is particularly relevant to clinical trials, because the change score will be used to assess potential differences in the effect of different interventions. It would be important to note that the interreader agreement for baseline total scores was very high if 1 reader was excluded. Future methodological studies similar to the present report may benefit from longer calibration sessions, including more image samples, to achieve better consistency between all readers. Previous studies of the OMERACT RAMRIS-JSN score have shown high intrareader and interreader ICC for status scores9,10. However, this was the first time, to our knowledge, that intrareader and interreader agreement were assessed longitudinally.
The intrareader SDC for 3 readers and interreader SDC for all readers were low, suggesting that the OMERACT RAMRIS-JSN score is able to detect change in JSN. This is supported by the fact that the SDC represents only a low percentage of the maximal score observed. In addition, the SDC for separate joints were below 1 at all sites. This suggests that the OMERACT RAMRIS-JSN is able to reliably detect a change in the individual joint spaces at the level of the lowest increment of the scoring system.
Overall PEA and particularly PCA were high in the majority of joints for baseline scores, and high in all joints for change scores, despite the scoring exercise including 4 readers, making it difficult to reach high PEA. Higher agreement for change score may suggest that consistency is easier to achieve when assessing paired images than when assessing different cases, owing to anatomical variations. Some joint space sites, particularly the TRM-TRD joint, showed minimal change, which may support excluding this joint from the scoring system, despite high intrareader and interobserver agreement and high prevalence of baseline score above 0.
In the present study, the SRM for the total score was moderate in all readers, suggesting that the OMERACT RAMRIS-JSN score is sensitive to change. However, further evaluation of sensitivity to change in other populations, including unselected image sets and randomized controlled trials, is required.
The time between baseline and followup was 36–60 months in the present study. Future studies should investigate the change in JSN in modern clinical trials with short duration, because the change in JSN may be markedly less. However, patients selected for RA clinical trials often have high disease activity and more rapid development of joint damage than would be expected in a broad early RA cohort with moderate disease activity such as the present.
Other semiquantitative MRI scoring systems evaluating JSN of the RA hand have been reported. McQueen, et al12 presented a 5-point scoring system assessing 8 joint spaces of the wrist. The scoring system showed high intrareader and interreader reliability for status scores, and 1 of 2 readers was able to detect a statistically significant change in JSN score over time13. However, not all joint spaces of the wrist were assessed; consequently it could not be assessed whether the selected areas were the most reliable and sensitive to change. Peterfy, et al14,15 proposed a scoring system that evaluated articular cartilage directly rather than JSN. The method included the wrist joints, metacarpophalangeal (MCP), and proximal interphalangeal (PIP) joints using a 9-point scale, demonstrating high agreement with conventional radiography and a statistically significant change over time.
An advantage of the OMERACT RAMRIS-JSN score is that it can be incorporated as a part of the highly validated RAMRIS score, which assesses synovitis, bone marrow edema, and bone erosions; or it can be used separately. This may provide additional information on structural joint damage, which may be of importance to the physical function and work ability5,6. Further, the score of the JSN could easily be included in the RAMRIS evaluation, as no additional sequences would be required beyond those needed for erosion assessment. This allows assessment of JSN on preexisting MRI sets, in which coronal RAMRIS sequences have been obtained.
A limitation of the present study is the small sample size, and that the preselected cases do not represent a random selection, i.e., may not be representative of the general population. Further, chemical shift artefacts may complicate assessment of JSN when images without fat suppression are used, as in the present study. Finally, not blinding the readers to time order of the images may have introduced bias in favor of detecting progression of JSN, potentially influencing SRM calculations. A future step is to evaluate the sensitivity to change of the JSN score in a randomized controlled clinical trial.
The OMERACT RAMRIS-JSN score overall showed high intrareader and interreader reliability, and moderate sensitivity to change, supporting its validity as a tool for assessing JSN in RA clinical trials. This also supports that the measure can be included as a component of the OMERACT RAMRIS scoring system.