Abstract
Objective. To compare the metric properties of a computer-assisted erosion segmentation volume measurement with scoring using the Rheumatoid Arthritis Magnetic Resonance Imaging Score (RAMRIS) in a longitudinal cohort of patients with rheumatoid arthritis (RA).
Methods. Thirty-two sets of baseline and 2-year followup magnetic resonance imaging (MRI) of metacarpal phalangeal 2–5 joints of patients with RA were scored using RAMRIS and segmented using OSIRIS software. The smallest detectable difference (SDD), standardized response mean (SRM), and paired t-test were used to evaluate the sensitivity to change. Eleven of the 32 patients’ MRI were segmented by both readers to evaluate interreader agreement. The 28-joint Disease Activity Score (DAS28) and Sharp erosion scores further evaluated construct and longitudinal validity.
Results. Reliability of erosion progression by computer-assisted volume measurement was superior to RAMRIS [intrareader interclass correlation coefficient (ICC) 0.97 (0.94–0.99) vs 0.52 (0.22–0.73)] and interreader ICC of volume measurement was 0.85 (0.53–0.96). Computer-assisted volume measurements identified 10 of 32 patients who progressed more than the SDD progression, whereas RAMRIS identified only 4 of 32 patients (p = 0.0013). By a paired t-test, however, all MRI measures progressed significantly over 2 years (irrespective of treatment arm) and there was little difference by SRM. Construct correlational validity of the MRI methods was 0.47–0.90 for status scores and 0.33–0.81 for progression. There was no relationship between the average DAS28 and erosion progression by any imaging method.
Conclusion. Computer-assisted measurement of erosion volume has good performance metrics. It had excellent intrareader and interreader reliability and was more sensitive to change than RAMRIS in this group of patients. www.ClinicalTrials.gov, NCT00451971.
- RHEUMATOID ARTHRITIS
- MAGNETIC RESONANCE IMAGING
- SENSITIVITY
- IMAGE INTERPRETATION
- COMPUTER ASSISTED
- SPECIFICITY
- REPRODUCIBILITY OF RESULTS
Bone erosions in rheumatoid arthritis (RA) are common and impart important diagnostic and prognostic information. Erosion in early stages of RA is a poor prognostic sign indicating potentially aggressive disease1. The importance of early detection and sensitive measures to monitor the disease progression cannot be overemphasized. This need places demands on our current imaging and quantification techniques. Although the plain radiograph is the gold standard for bone erosions, the disadvantages of radiographs have been well documented. By contrast, MRI is more sensitive in detecting erosions in early disease, more responsive to bone erosion changes, and can visualize lesions 6 to 12 months before they appear on radiographs2,3,4,5,6. It can also provide information regarding synovitis and bone edema and offers an attractive multi-plane alternative to radiographs7.
With MRI as the preferred imaging modality, an accurate measurement tool is required for baseline and subsequent followup measurements to monitor disease progression. The Outcome Measures in Rheumatology Clinical Trials group has created a semiquantitative scoring method, the Rheumatoid Arthritis Magnetic Resonance Imaging Score (RAMRIS), to provide a standardized and reproducible measurement system that can be used in a multicenter setting. Using MRI as the imaging modality, the RAMRIS system assesses disease activity and damage for both the wrist and metacarpal phalangeal (MCP) joints8. RAMRIS’s feasibility, sensitivity to detect changes, and reliability have been demonstrated9,10,11. However, scoring systems such as RAMRIS may be less likely to detect subtle disease progression compared with computer-assisted techniques that allow continuous measurements12. Further, computer-assisted radiograph measurements have been shown to be more sensitive in detecting bone erosion changes than scoring13. Previous studies on semiautomated computer-assisted volume measurement demonstrated good construct validity, excellent intrareader and interoccasion reliability, and correlated well with RAMRIS for bone erosion measurements14,15,16. Despite many studies on computer-assisted volumes measurements, there is little evidence on the comparison between RAMRIS and computer-assisted volume measurements to determine their accuracy in detecting progression of bone erosions17.
These issues prompted us to investigate whether computer-assisted measurement of bone erosion volumes in patients with RA using a semiautomated software program is more responsive to change than the current RAMRIS. In addition, we evaluated other metric properties of computer-assisted measurement of bone erosion volumes in patients with RA.
MATERIALS AND METHODS
Subjects
All MRI (n = 32) on digital media were obtained from a 2-year randomized, controlled, treat-to-target versus “usual care” trial in patients with RA [the Target Study in Rheumatoid Arthritis (TASRA)]18. This trial was designed to determine whether increasing antirheumatic therapy can achieve and maintain a target of reduced joint damage progression on MRI and radiographs compared to usual care.
TASRA was approved by the Institutional Ethics Committee of St. George Hospital and the University of New South Wales. All subjects fulfilled the American College of Rheumatology diagnostic criteria for RA19.
MRI readers
Reader 1 (BP) is a rheumatologist with formal training in MRI. Reader 2 (MP) is a medical student with no previous training in MRI evaluation. Reader 2 undertook two 2-hour training sessions followed by a 2-hour calibration session. The training involved a review of normal MCP anatomy using models, diagrams, and MRI scans of the MCP joints; review of abnormal MCP anatomy using MRI scans; and review of erosion volume measurements using the computer-assisted technique. During the calibration session, the computer-assisted volume measurements of 8 MRI subjects of readers 1 and 2 were compared and consensus reached.
MRI measurement methods
RAMRIS was used to score the bone erosions of dominant-hand MCP joints 2–5 on hard-copy MRI films8. Each metacarpal head and phalangeal base of the MCP joints 2–5 was assessed separately. Bone erosions were scored on a scale of 0–10 depending on the volume of erosion as a proportion of the “assessed bone volume,” where 0 = no erosion, 5 = 41%–50% bone loss, and 10 = 91%–100% bone loss. The assessed bone volume was from the cortex of the articular surface to a depth of 1 cm. The scores of MCP joints 2–5 were summed. MRI scoring was performed by reader 1 only (a codeveloper of RAMRIS).
MRI erosion volume measurement
MR images from compact disks were transferred to a personal computer for the erosion volume calculations. An erosion was determined if it was seen in both axial and coronal planes and breached the bone cortex in at least 1 plane. The erosions identified were within the region of the assessed bone volume of the MCP joints 2–5. The erosion volume was calculated using the OSIRIS software (developed by the Digital Imaging Unit, Radiology Department, University Hospitals of Geneva). Every erosion was outlined manually in each T1-weighted coronal slice and the erosion area calculated by the OSIRIS software from the multiple slices. The erosion volume was derived from the multiplication of this area with the slice thickness using this standard formula:
Coronal MRI of the fourth metacarpal phalangeal joint. Segmented area shows bone erosion on MRI slice 33.
Volume of the bone erosion for the segmentation area on MRI slices of the fourth metacarpal phalangeal joint. Circled region shows the value of the area segmented on MRI slice 33.
MRI measurements
A Signa Horizon 1.5 Tesla unit (General Electric) was used. The dominant MCP joints 2–5 were imaged. The 1 mm slice thickness MRI images were obtained with a 3-D gradient echo with the following settings: echo time 30 m/s, repetition time 12 m/s, field of view 100 mm, matrix 256 × 256, and slice thickness 1 mm.
Clinical assessments
All patients had monthly assessment of the 28-joint Disease Activity Score (DAS28). Radiographs of the hands were also obtained at baseline and at 2 years and erosions on the MCP joints 2–5 were scored with the Sharp scoring method20. The data for MCP joints 2–5 erosion scores of 4 radiographs were not included in this study as they were not available for analysis.
MRI studies
Baseline and 2-year followup MRI of 1 mm thickness of 32 subjects were scored independently on 2 separate occasions using the RAMRIS system by reader 1. The MRI of 1 mm slice thickness of the same 32 subjects were measured independently on 2 separate occasions using the computer-assisted technique by reader 2. Reader 1 also measured erosion volume on a random subset of MRI from 11 subjects. All MRI images were read paired in known chronological sequence but were not read in parallel. The time taken to perform the erosion volume measurement of each image was recorded. Timing encompassed the interval from opening the image on the computer screen to completing the volume analysis.
Statistical analysis
For reproducibility/method agreement studies, the intrareader agreement of the RAMRIS erosions and computer-assisted erosion volume measurement of status and progression scores were assessed using a relative and an absolute metric. These metrics were also used to assess the interreader agreement of the computer-assisted erosion volume measurement of status and progression scores. The relative metric used a 2-way mixed repeated ANOVA intraclass correlation coefficient (ICC) and the absolute metric used a 2-way mixed repeated ANOVA smallest detectable difference (SDD)21. The SDD is a value expressed in the same scale of measurement as the score/erosion volume11. The intra-reader SDD was calculated by multiplying the square root of the mean residual error (root MSE) of 2-way mixed ANOVA by 2.042 (using the t-distribution with 31 degrees of freedom) and √2 and interreader SDD was calculated by multiplying the root MSE of 2-way mixed ANOVA by √2 and 2.228 (using the t-distribution with 10 degrees of freedom). The intrareader of 11-patient subsets was also determined by the root MSE × √2 × 2.22811,22. We calculated the SDD percentage, which is the SDD as a percentage of the maximum score, to facilitate comparisons of the reliability of scores versus computer-assisted volumes6.
Longitudinal validity/sensitivity to change
RAMRIS erosion scores and computer-assisted erosion volumes of MRI that changed more than their respective SDD (MRI erosion progression beyond measurement error) were determined and evaluated with a 1-sample test of proportion. We also provide the standardized response mean (SRM) of each measurement technique, obtained by dividing the mean change from time1 – time0 by the SD of this change, and these SRM were compared with the SRM of the Sharp erosion score. Erosion progression over 2 years of all imaging methods (MRI and radiographs) were compared using a paired t-test. Finally, although we did not expect any imaging method to show a statistically significant erosion progression by trial arm (because this is a 32-subject subset from a 210-subject trial), we also provided those results. STATA 11 was used for data analysis.
Construct validity
Pairwise Pearson correlation coefficients of all erosion imaging methods were compared at baseline, at 2-year followup, and for progression. We also provided the correlation of the average DAS28 score over the 2-year period with each erosion progression method.
TASRA is registered on www.ClinicalTrials.gov, NCT00451971.
RESULTS
There were 25 women and 7 men. The median age of the study subjects was 50.5 years (range 27–76 yrs) and the median disease duration was 4 years (range 0–18 yrs). The median score of the average 24-month DAS28 was 3.62 (range 1.63–5.69). The descriptive statistics (mean, range, and SD) of the RAMRIS system and volume measurement at baseline and followup are shown in Table 1. The average time to segment 1 patient’s MRI was 23.5 min (minimum 4 min, maximum 58 min). The average time to score 1 patient’s MRI using RAMRIS was 12 min.
Mean, range, and SD of scores and volume measurements (all second measurements).
Table 2A shows the intrareader ICC, SDD, and SDD as a percentage of the maximum score, or erosion volume. The ICC values for the volume measurement were excellent and the status ICC values were comparable with those obtained for RAMRIS, but the progression ICC values were better than those obtained for RAMRIS. The intrareader progression SDD expressed as a percentage of the maximum score was lower for the computer-assisted volume measurement than for RAMRIS. Both the intra-reader and interreader ICC values were excellent (Table 2B), although reader 2 volumes were on average greater than those of reader 1 (volume erosion progression mean difference 28.4; p = 0.27). The intrareader values of the 11 patients were better than the values obtained for interreader.
Intraclass correlation coefficient (ICC) and smallest detectable difference (SDD) statistics of both status and progression in metacarpal phalangeal (MCP) joints 2–5. A. Intrareader studies (n = 32): (i) scoring (reader 1), and (ii) volume measurements (reader 2). B. Intrareader/interreader studies (n = 11 subject subset): (i) intrareader (reader 2) volume measurements, and (ii) interreader (readers 1 and 2) volume measurements.
The number of patients detected to have progressive joint destruction by volume measurement on 1 mm acquisition was significantly higher than the number of patients detected by RAMRIS. Computer-assisted volume measurements identified 10 out of 32 patients who progressed more than the SDD of progression, whereas scoring identified only 4 to have progressed on bone erosion changes (1-sample test of proportion; p = 0.0013). If the second readings are used, the comparison is 9 and 2, respectively, for computer-assisted volume erosions and RAMRIS (1-sample test of proportion; p < 0.001). The computer-assisted erosion volume measurement had the highest SRM of 0.63. The SRM for the RAMRIS was 0.58. The SRM for Sharp-scored radiographic erosions was 0.39. When analyzed by treatment group, there was less progression on erosion volume in the active arm (mean progression) although this difference was smaller than the SDD of progression and not statistically significant (t-test p = 0.6). There was no statistical difference using RAMRIS, although in this subset analysis there was more progression in the active arm. By a paired t-test, however, all MRI measures progressed over 2 years (irrespective of treatment arm). Erosion volume measurement mean progression was 99.1 (p = 0.001) and for RAMRIS was 1.156 (p = 0.002). Sharp radiographic erosions progressed by 0.143 (p = 0.043).
The correlations among the various erosion imaging measures are provided in Table 3. Overall, first-read correlations were higher than second-read and status measure correlations higher than progression measures. There was no relationship between the average DAS28 and erosion progression on any method (Pearson correlations −0.10 to 0.005).
Pairwise Pearson correlation coefficients of erosion imaging methods at baseline (t0), at 2-year followup (t1), and for progression over 2 years. MRI methods were read on 2 independent occasions. All sets include 32 subjects unless otherwise indicated.
DISCUSSION
Our study suggests that computer-assisted quantitative measurement of erosion volume is more sensitive to change than RAMRIS scoring in the MCP joints. Erosion volume measurement detected a significantly higher number of changes that were beyond the SDD value than did the RAMRIS system, although the SRM difference was nominal. Further, the expression of SDD as a percentage of the maximum score is lowest for the computer-assisted erosion volume measurement. Computer-assisted measurements allow continuous measurements to be taken, whereas the RAMRIS system scores the changes in discrete scales. The lack of flexibility of the RAMRIS system to register a change within a scale is the most probable reason for its decreased ability to detect subtle disease progression. The SDD is an important statistic in our study because it sets the threshold level for change beyond measurement error6. Using the paired t-test, all erosion imaging methods showed statistically significant progression over 2 years on a by-group basis.
The poor correlation between erosion progression and average DAS28 was not surprising for the MRI measures given the focal nature of MRI examination, although it was surprising for radiographs. At this stage, the relationship between bone erosions and predictors of disease activity is still controversial23,24,25. Nonetheless, these results demonstrated the difficulties of using these predictors of disease activity to provide accurate predictions of the destruction in the bones. Construct validity of erosion volume measurement was supported by the excellent correlation with RAMRIS and reasonable correlation with radiographs.
Besides having high sensitivity to detect change, a good measurement must also be reproducible and precise to be of clinical value. We used ICC, a measure of reproducibility of measurement techniques. The excellent intrareader ICC values of the erosion volume measurements were generally comparable with the ICC values of RAMRIS and results from a previous study15. The status interreader ICC from this study was better than that from other studies15,26. Our previous study showed that the interoccasion agreement for erosion volume measurement from repeated acquisitions 48 hours apart demonstrated a high level of agreement, with ICC of 0.92 to 0.9916.
The observation of quantitative measurement of bone erosion being superior to qualitative measurement was suggested in other longitudinal studies. It has been demonstrated in multispectral MRI analysis, where automated quantitative measurements were more sensitive in detecting change than visual scoring. However, the study was limited by the lack of reproducibility data of measurements used, lack of a standardized visual scoring system (e.g., RAMRIS), and grouping of bone marrow edema and bone erosions, instead of measuring bone erosion alone12. Other imaging modalities such as digital radiography also supported this notion. Although they were not longitudinal studies, computer-based volume measurements had good correlation with scoring and detected more pronounced difference between treatment groups with volume measurements in one study13,27.
Our previous study highlighted the limitations of computer-assisted segmentation using the OSIRIS program: the inability to calculate erosion volume when the erosion is seen on only 1 coronal slice, and the time required for erosion volume segmentation and estimating the preerosion border. In our early work, we found erosions that met RAMRIS scoring criteria, but their erosion volumes could not be calculated because they were seen on only 1 coronal slice26. Therefore, MRI acquisitions of 1 mm slice thickness were used in our study. Another limitation of computer-assisted volumes is the longer time required for the segmentation process. The segmentation process (23.5 min) required almost twice as much time as scoring (12 min). Clearly, if there is an advantage of computer-assisted volume measurement over scoring with respect to sensitivity to change, this time limitation may be less important. Further studies are needed.
Estimating the preerosion border was a challenge with computer-assisted segmentation. As demonstrated in a previous study using MRI slice thickness of 3 mm, readers were able to agree on abnormal bones, but not on the preerosion bone border26. The improvement in the interreader agreement in our study may be attributed to the thinner MRI slice thickness used. As the readers scrolled through the 1 mm slice thickness, the gradual recession of the normal concave bone border (in coronal plane) into the eroded borders was better appreciated, thereby achieving a better interreader agreement. More advanced computerized methods may assist with the estimation of preerosion borders in the future.
While the results of our study are limited to MCP joints 2–5, it will be applicable to both the proximal interphalangeal joints and distal interphalangeal joints, because the anatomy of these joints is largely similar28. However, results may not apply to wrist joints, which are also commonly evaluated in RA studies and are anatomically more complicated26,29.
A possible confounding factor was the decision to read the MRI images in paired chronological sequence but not in parallel. Conventional radiographs yielded greatest signal-to-noise ratio with paired, parallel, and chronological readings. However, it was unclear whether the higher signal indicated real changes9,30. Since its application in MRI remains to be validated, our approach was adopted to attempt to increase our sensitivity to detect change, yet subjecting both measuring methods (scoring and segmentation) to the same noise. While different viewing modalities used for scoring and segmenting may be another potential confounder, studies comparing film and digital techniques showed good agreement, with no significant statistical differences between them31,32.
To our knowledge, this is the first study to compare quantitative measurement of MRI erosion volume with semiquantitative RAMRIS in a longitudinal setting. The encouraging results could lead to the use of quantitative measurements in assessing bone erosion changes in patients with RA. Quantitative measurements may also be extended to bone marrow edema measurement in future studies, given the close relationship bone marrow edema shares with bone erosions17. While computer-assisted segmentation may be time-consuming, recent developments in automated and semiautomated quantitative measurement will negate this issue. Promising semiautomated and automated segmentation measurements are being developed33. However, until the ideal, systematically tested, quantitative measurement is available, current manual segmentation of bone erosion should suffice as an acceptable measuring tool for detecting erosion changes34,35.
Footnotes
-
The original Target Study in Rheumatoid Arthritis (TASRA) was supported by a research grant from Sanofi-Aventis Australia.
- Accepted for publication September 5, 2012.