Abstract
Objective. The RAMRIS [Outcome Measures in Rheumatology rheumatoid arthritis (RA) magnetic resonance imaging (MRI) Scoring system] is used in clinical RA trials. We have investigated methods to combine the RAMRIS features into valid and responsive scores for inflammation and joint damage.
Methods. We used data from 3 large randomized early RA trials to assess 5 methods to develop a combined score for inflammation based on RAMRIS bone marrow edema, synovitis, and tenosynovitis scores, and a combined joint damage score based on erosions and joint space narrowing. Methods included unweighted summation, normalized summation, and 3 different variants of weighted summation of the RAMRIS features. We used a derivation cohort to calculate summation weights to maximize the responsiveness of the combined score. Construct validity of the combined scores was examined by assessing correlations to imaging, clinical, and biochemical measures. Responsiveness was tested by calculating the standardized response mean (SRM) and the relative efficiency of each score in a validation cohort.
Results. Patient characteristics, as well as baseline and followup RAMRIS scores, were comparable between cohorts. All combined scores were significantly correlated to other imaging, clinical, and biochemical measures. Inflammation scores combined by normalized and weighted summation had significantly higher responsiveness in comparison to unweighted summation, with SRM (95% CI) for unweighted summation 0.62 (0.51–0.73), normalized summation 0.73 (0.63–0.83), and weighted summation 0.74 (0.64–0.84). For the damage score, there was a trend toward higher responsiveness for weighted summation.
Conclusion. Combined MRI scores calculated by normalized or weighted summation of individual MRI pathologies were valid and responsive.
Magnetic resonance imaging (MRI) allows detailed assessment of the synovial joint. In rheumatoid arthritis (RA), MRI is more sensitive than radiography for detecting bone erosions and cartilage loss1,2,3, and can visualize the inflammatory lesions that precede joint destruction4,5,6,7,8.
MRI features are frequently used as outcome measures in RA clinical trials8,9. Outcome Measures in Rheumatology (OMERACT) is an independent initiative to develop and validate outcome measures for clinical trials in rheumatic diseases10,11. The OMERACT RA MRI Scoring system (RAMRIS) outlines semiquantitative scoring of 5 RA pathologies: bone erosions, joint space narrowing (JSN), synovitis, tenosynovitis, and bone marrow edema (BME) in the wrist and metacarpophalangeal joints2,12,13. However, the primary interest in clinical studies might be the total inflammatory activity or the progression of total structural joint damage.
The objective of this study was to develop and validate 2 combined MRI scores, one for inflammation and one for joint damage, derived from the 5 RAMRIS pathology scores, with emphasis on responsiveness and construct validity.
MATERIALS AND METHODS
Validation and derivation cohorts
We used data from the ARCTIC14 trial as a derivation cohort for the combined scores. Performance of the scores was assessed in a validation cohort of pooled data from the CIMESTRA15 and OPERA16 study groups. ARCTIC was a 24-month randomized clinical trial, studying ultrasound (US) for treatment decision making. Participants (n = 230) were patients who had early RA and were aged 18–75 years, and were naive of disease-modifying antirheumatic drugs (DMARD). They fulfilled the 2010 American College of Rheumatology (ACR)/European League Against Rheumatism criteria, with indication for DMARD treatment. Both CIMESTRA and OPERA were randomized controlled trials (RCT). CIMESTRA studied treatment with methotrexate (MTX) and intraarticular betamethasone in early RA, and the additional effect of adding cyclosporine to the regimen. OPERA studied the effect of adding adalimumab to MTX and intraarticular triamcinolone as first-line therapy in early RA. Participants (CIMESTRA n = 160, OPERA n = 180) were > 17 years, fulfilled the 1987 ACR criteria, and had moderate to severe disease activity.
Written informed consent was obtained from all participants. The trials were approved by the local ethics committees (approval reference numbers: ARCTIC: 2010/744; CIMESTRA: M-1959-98; OPERA: VEK-20070008).
Imaging
MRI of one hand (acquisition as outlined in the RAMRIS core set12) was performed together with conventional radiographs of hands and feet at baseline and 12 months in all 3 trials. A single reader (CIMESTRA/OPERA: DG, ARCTIC: US) blinded to the treatment arm and clinical data scored the MR images according to RAMRIS, with known chronological order. Reliability of scorings was overall very good (intra- and interreader comparisons for ARCTIC: Supplementary Table 1, available with the online version of this article; intrareader for CIMESTRA/OPERA: previously published17). Radiographs were scored according to the van der Heijde-modified Sharp score. In ARCTIC, US was performed yearly for all patients according to a validated scoring system18.
Clinical variables
At each visit, these variables were registered: tender and swollen joint counts, pain, patient’s and physician’s global assessments, and C-reactive protein. In ARCTIC, erythrocyte sedimentation rate was also analyzed. Physical function was assessed by the Health Assessment Questionnaire in CIMESTRA and OPERA, and by the Patient-Reported Outcomes Measurement Information 20-item short-form in ARCTIC.
Calculation of combined scores
We categorized RAMRIS scores as either inflammation (synovitis, tenosynovitis, BME) or damage (erosions, JSN), and calculated the combined score for each category. Calculation was done using 5 different approaches, aiming to find which method would provide the most responsive combined score.
Approach 1: Unweighted summation
Combined scores were calculated by numerical summation of the RAMRIS scores for each category. These scores were used as reference.
Additionally, we tested several methods for transformation of the RAMRIS scores, before summation.
Approach 2: Normalized summation
The RAMRIS scores differ in range, and will therefore have a disproportionate part of the total score if summarized without transformation. To counteract this, scores were transformed to the same range before summation.
Approach 3: Weighted summation
Each RAMRIS score was transformed by a multiplication factor (weight). To maximize responsiveness, weights were calculated in a data-driven approach to give the highest standardized response mean (SRM) to the resulting score in the derivation cohort. To make the system more adaptable, each RAMRIS score was divided into 3 anatomical areas, which were weighted individually. The areas and corresponding weights are shown in the Appendix 1.
Approach 4. Adjusted-weighted summation
To simplify the weighting system, data-derived weights from Approach 3 were rescaled to whole numbers according to rank. Adjustment of ± 1 step was allowed to optimize performance (Appendix 1).
Approach 5. Single site–weighted summation
As in Approach 3, but weights were calculated for each individual bone, joint, and tendon.
Statistical analysis
Baseline characteristics were described as proportions or median values as appropriate. Construct validity of the suggested combined MRI scores was tested by calculating the Spearman correlation coefficients to established disease measures. Responsiveness was tested by calculating the SRM for the suggested combined scores, the RAMRIS scores, and radiographic variables:
Relative efficiency was computed for each combined score with unweighted summation as reference:
CI for SRM and relative efficiency were estimated by bootstrapping with 5000 replications. Only patients where all variables were available for baseline and the 12-month visit were included. Data analyses were undertaken using STATA v.14 (StataCorp).
RESULTS
Patient characteristics
Data from 194 patients from the ARCTIC trial (derivation cohort), and 195 patients from CIMESTRA and OPERA (validation cohort) were used. A larger proportion of the patients in the derivation cohort were positive for anticyclic citrullinated peptide (82% vs 61.5%, p < 0.001), and disease activity variables were somewhat higher in the validation cohort (Supplementary Table 2, available with the online version of this article). Duration of symptoms at inclusion was longer in the derivation cohort (median 166 days vs 91 days, p < 0.001). Otherwise, patient characteristics were comparable between the cohorts.
MRI variables
Baseline scores for synovitis were slightly higher in the validation cohort. Median 1-year changes of inflammatory scores were similar in both cohorts. Baseline median erosion scores were similar in both cohorts, while the JSN score was higher in the validation cohort. The median 1-year changes for both erosions and JSN were comparable between the cohorts (Supplementary Table 3).
Construct validity
All combined scores were significantly correlated to other imaging, clinical, and biochemical measures. MRI inflammation scores were most strongly associated with US inflammation variables, while associations between MRI damage scores and radiographic measures were overall moderate (Table 1).
Responsiveness
For inflammation, relative efficiency for normalized summation (Approach 2), weighted summation (Approach 3), and adjusted-weighted summation (Approach 4) were statistically significantly superior to unweighted summation (Approach 1), when tested in the validation cohort (Figure 1). Approaches 3 and 4 provided the numerically highest SRM values (Table 2); however, differences between Approaches 2, 3, and 4 were not statistically significant. For damage, no approach was significantly superior to unweighted summation, although Approach 4 provided the highest SRM values.
DISCUSSION
We have developed and tested combined MRI scores identifying the principal pathogenic constructs of RA: inflammation and damage. For clinical trial settings, these 2 measures might be more important than the scores of the individual MRI lesions.
In previous studies, combined scores have been obtained through slightly differing methods3,17,19. To ensure comparability between studies, and to avoid biased reporting, there is a need for consensus regarding which method to use20.
It could be argued that if responsiveness were the sole priority, it would be easiest to use only the most responsive single pathology, e.g., tenosynovitis in the present study. However, that would discard a large proportion of MRI information. By weighted summation, we could obtain responsive combined scores, while still covering the full spectrum of pathology. Approaches using complex weightings derived from data resulted in the numerically most responsive scores, but the gain was marginal compared to the simpler normalization approach.
The strengths of these analyses include the large datasets, with baseline and 1-year followup MRI data of 289 patients from 3 RCT in early RA. By separating our data in derivation and validation cohorts, we were able to assess the validity and generalizability of our proposed combined scores with higher confidence than if only 1 dataset had been used.
Limitations include the lack of opportunity to examine the discriminative properties of the combined scores, because none of the original trials showed significant group differences for clinical or MRI endpoints. A dataset with clinical differences between the treatment arms is needed to examine this.
The SRM values of our scores were relatively low compared to a similar study19. This might be explained by limited changes in RAMRIS scores during the followup, especially for joint damage.
We found that combined MRI scores for inflammation and joint damage can be responsive and valid. Our data indicate that the responsiveness of combined scores for inflammation could be improved by using normalized or weighted summation of the RAMRIS pathologies, rather than unweighted summation. However, our results do not support promoting one of these approaches over another. For the combined damage scores, there was a trend favoring weighted summation, but results were inconclusive. The discriminative properties of the scores need to be tested in placebo-controlled clinical trials.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
We thank Joe Sexton for help and advice on statistical calculations and support on using statistical software, and Lena Bugge Nordberg and Nina Paulshus Sundlisæter for help with the ARCTIC database. We also thank all investigators, study personnel, and patients who have contributed to the clinical trials that this study is based on.
APPENDIX 1.
- Accepted for publication December 6, 2018.