Abstract
Objective. The Rheumatoid Arthritis Magnetic Resonance Imaging Score (RAMRIS) is validated for hand MRI. Its reliability applied to metatarsophalangeal (MTP 1–5) joints is unknown and was studied in early arthritis and clinically suspect arthralgia.
Methods. Patients underwent 1.5 Tesla MRI of MTP, metacarpophalangeal (MCP 2–5), and wrist joints. Two paired readers scored bone marrow edema (BME), synovitis, tenosynovitis, and erosions. Interreader reliability was assessed of 441 consecutive early arthritis patients at baseline, 215 by 2 readers, and the remaining 226 by 2 different readers. Two readers scored baseline MRI of 82 consecutive patients with clinically suspect arthralgia, and 40 randomly selected patients by 9 readers. Intrareader reliability was determined on a random set of 15 early arthritis patients, scored twice by 2 readers. For change scores, 30 early arthritis patients with baseline and 1-year followup MRI were scored by 2 readers. Intraclass correlation coefficients (ICC), Bland-Altman (BA) plots, and smallest detectable change (SDC) were determined. MRI data of MTP joints were compared to wrist and MCP joints.
Results. Interreader ICC and mean scores in early arthritis were BME ICC 0.91–0.92 (mean 1.5 ± SD 2.6), synovitis 0.90–0.92 (1.3 ± 1.7), tenosynovitis 0.80–0.85 (1.1 ± 1.8), and erosions 0.88–0.89 (0.7 ± 1.0). In patients with clinically suspect arthralgia, ICC were comparable. Intrareader ICC for inflammatory MRI features were 0.84–0.98, for erosions 0.71 (reader 1), and 0.92 (reader 2). Change score ICC were ≥ 0.90, except erosions (0.77). SDC were ≤ 1.0. BA plots showed no systematic bias. Reliability scores of MTP joints were similar to MCP and wrist joints.
Conclusion. Status and change MRI scores of BME, synovitis, tenosynovitis, and erosions of MTP joints can be assessed reliably by RAMRIS.
- RHEUMATOID ARTHRITIS
- MAGNETIC RESONANCE IMAGING
- RAMRIS
- RELIABILITY
- FOOT
Magnetic resonance imaging (MRI) is increasingly used in scientific research in patients with rheumatoid arthritis (RA) because it is a sensitive modality that can visualize inflammation and destruction1. Because the complexity and large amount of information that is provided by MRI pose a challenge, the Outcomes in Rheumatology (OMERACT) MRI in RA working group developed the RA MRI Score (RAMRIS) to standardize MRI scoring for research purposes and clinical trials in particular2.
The RAMRIS has to date been validated for use in the metacarpophalangeal (MCP) and wrist joints, but not for use in metatarsophalangeal (MTP) joints3,4,5. This is unfortunate because joint inflammation in MTP joints is just as prevalent as in the MCP joints6,7. In addition, radiographic studies have shown that erosive change occurs more commonly in the feet than in the hands, and also in earlier phases of disease8,9. Thus there is a paradox: the feet are so commonly affected in early RA, yet they are absent as an outcome measure in trials. Indeed, the RA MRI working group has called for validation of the RAMRIS in the MTP joints10.
An important aspect of validation is the reliability of scoring11. Reliability studies have been performed for the hand, but cannot be directly extrapolated to the foot, because different joint areas in the past have been found to have different intraclass correlation coefficients (ICC)12. Previously, Baan, et al measured the reliability of the RAMRIS of the feet in a small subset of patients with longstanding RA (n = 29)13. However, tenosynovitis, which is a common feature in early arthritis, was not included in that study. In addition, because no followup MRI were included, only the reliability of status scores was assessed. For change scores, one study has been performed by Ejbjerg, et al that assessed MRI-detected erosions only14. We therefore aimed to assess the inter- and intrareliability of status scores and the reliability of change scores applied to the MTP joints for the following MRI outcomes: bone marrow edema (BME), synovitis, tenosynovitis, and erosions. Because the focus in rheumatology is shifting from established erosive RA to early arthritis and even to patients with arthralgia that is suspected to progress to arthritis1, we performed our study in patients with early arthritis and also in patients with clinically suspect arthralgia without apparent arthritis upon physical examination. We added MRI data of wrist and MCP joints as comparison to data of MTP joints.
MATERIALS AND METHODS
Early arthritis cohort
This longitudinal inception cohort included patients with clinically confirmed arthritis and symptom duration < 2 years who were naive to disease-modifying antirheumatic drugs (DMARD). At baseline, questionnaires were completed, swollen joint counts were performed, and serum samples were obtained. Unilateral 1.5 Tesla (1.5T) MRI of the MTP, MCP, and wrist joints of the most painful side, or the dominant side in the case of equally severe symptoms on both sides, was made of patients who were consecutively included from June 2013 onward15. Before contrast administration, T1-weighted fast spin echo (FSE) sequences in the coronal plane were acquired for MCP and wrist joints. After intravenous injection of gadolinium contrast, T1-weighted FSE sequences with frequency selective fat saturation were acquired in coronal and axial planes of the MCP, wrist, and MTP joints. Patients were asked to stop nonsteroidal antiinflammatory drugs (NSAID) 24 h before the scan, and the MRI was made before the start of DMARD. Additional information on the scan protocol is provided in Supplementary File 1 (available with the online version of this article).
Consecutive patients included between June 2013 and April 2016 were studied for status scores. In the cohort, serial MRI were made of patients included until January 2015.
Clinically suspect arthralgia
This inception cohort included patients with clinically suspect arthralgia of the small joints with a symptom duration of < 1 year that, according to the clinical expertise of the rheumatologist, was expected to progress to RA over time. Per definition, clinically suspect arthralgia was not present if clinical arthritis was observed at physical examination or if another explanation for the arthralgia was more likely16. Patients consecutively included between July 2014 and February 2015 were studied, and they underwent MRI according to the same MRI protocol as patients with early arthritis.
Both cohorts were approved by the local Medical Ethical Committee (approval numbers Early Arthritis Cohort P10.108 and Clinically Suspect Arthralgia P11.210). All participants signed informed consent.
Readers
All readers were experienced with the OMERACT RAMRIS system and the method by Haavardsholm, et al for scoring tenosynovitis2,17. All readers scored > 400 MRI according to these systems during a training period of several months prior to evaluating the MRI that are part of our study.
MRI scoring
All readers evaluated the images independently and in the following order: first the MTP joints, next the MCP joints, and finally the wrist. The MRI images were scored blinded to clinical data. Synovitis, and erosions of MTP, MCP, and wrist joints, were scored in line with the OMERACT RAMRIS. BME was assessed on a contrast-enhanced T1-weighted fat-suppressed sequence18, because its use for depicting BME is recommended by the European Society of Musculoskeletal Radiology (ESSR), and previous studies have demonstrated that it has a strong correlation with the T2-weighted fat suppressed sequence that is advised by the RAMRIS19,20,21,22. In MTP and MCP joints, erosions and BME were scored in the proximal and distal part of the joints. Tenosynovitis was scored as described by Haavardsholm, et al, applied to the flexor and extensor tendons of MTP 1–5, MCP 2–5, and the wrist17. Additional information on the method of scoring is provided in Supplementary File 1 (available with the online version of this article), in addition to an example of a score sheet with illustration of the scored tendons (Supplementary Figure 1).
A flowchart of scored patients and readers is presented in Figure 1. Intrareader reliability was assessed based on 441 consecutive early arthritis patients. The first 215 patients were scored by readers 1 and 2, the remaining 226 MRI by readers 3 and 4. The MRI of 82 arthralgia patients were scored by readers 5 and 6. Of these 82 MRI, 40 were randomly selected and scored by 7 additional readers, resulting in a total of 9 readers (readers 2–10). For intrareader reliability, the baseline MRI of 15 early arthritis patients were randomly selected and rescored by readers 1 and 2 after an interval of 6 and 4 months, respectively.
Ninety-one early arthritis patients underwent MRI at baseline and at 12 months, and were all scored by reader 9. The reliability of change scores was determined using MRI of 30 patients, which were in addition also scored by reader 10. These 30 patients were selected as follows: 15 randomly and 15 based on a high baseline MRI score by reader 9. The 15 patients with a high baseline score were scored as part of a bigger set of patients that also included patients with lower scores; thus, the images were scored by readers who were blinded for the MRI score. We added patients with high baseline scores because they were most prone to change over time. The MRI were scored in chronological order by both readers23,24.
Statistical methods
For scores of MTP, MCP, and wrist joints separately, ICC estimates and their 95% CI were calculated (2-way mixed-effects model, absolute agreement)25. The single measures were used for the intrareader ICC, and the average measures for interreader and change ICC. ICC values < 0.5 indicate poor reliability; between 0.5 and 0.75, moderate; between 0.75 and 0.9, good; and > 0.90, excellent25. In addition to calculating ICC values, Bland-Altman (BA) plots were drawn26.
For change scores, in addition to the ICC and BA plots, the smallest detectable change (SDC) was calculated. SDC expresses the smallest change between 2 dependently obtained measures that can be interpreted as “real,” that is, a change greater than the measurement error27. The SDC of each MRI feature was calculated as follows:
Here k = 2 because the SDC on the mean scores of both readers was used27.
The proportion of patients who showed change in the RAMRIS score was calculated in 3 ways: using a cutoff of > 0 and > 0.5 (of the mean score of 2 readers), and by using the SDC as a cutoff.
Subanalyses for the interreader reliability were performed within the subgroup of patients with the following diagnoses: RA, unclassified arthritis (UA), psoriatic arthritis or spondyloarthritis (SpA), and inflammatory osteoarthritis (OA).
Data were analyzed using SPSS version 23 (IBM Corp.).
RESULTS
Patient characteristics
Characteristics of patients with early arthritis and clinically suspect arthralgia are shown in Supplementary Table 1 (available with the online version of this article). In both cohorts, patients were predominantly female (61% and 84%, respectively) and had a mean age of 55 and 46 years, respectively. Characteristics of the 30 patients with followup MRI are also presented in Supplementary Table 1; they had a higher swollen joint count than the overall early arthritis group (6 vs 3). Of these 30 patients, 29 were prescribed DMARD after the baseline visit during the first year of followup; 1 received NSAID only.
Interreader reliability
The interreader ICC, and median and mean MRI scores for patients with early arthritis and with arthralgia, are presented in Table 1. The scores of the individual readers are depicted in Supplementary Table 2 (available with the online version of this article). For the MTP joints in patients with early arthritis, the mean scores varied from 0.6 (SD 0.9) for erosions, to 1.5 (SD 2.6) for BME. The corresponding ICC for BME ranged from 0.91 to 0.92 (95% CI 0.90–0.93 and 0.90–0.94), for synovitis from 0.90 to 0.92 (95% CI 0.84–0.94 and 0.88–0.94), for tenosynovitis from 0.80 to 0.85 (95% CI 0.69–0.86 and 0.78–0.90), and for erosions from 0.88 to 0.89 (95% CI 0.84–0.91 and 0.86–0.92). In arthralgia patients, the mean and median scores of MRI features were lower, but ICC were similar and all > 0.87, except for BME that had an ICC of 0.77 (95% CI 0.64–0.85) when read by 2 readers, and an ICC of 0.95 (95% CI of 0.93–0.97) when there were 9 readers. The BA plots indicated that systematic bias was low; in Figure 2, the middle line, depicting the mean, was located around 0. Only for tenosynovitis was there a tendency toward more random variation with higher scores (heteroscedasticity).
The interreader reliability of the MRI features for MCP and wrist joints were similar to the MTP joints (Table 1).
In the sensitivity analyses, we looked at the reliability in the separate diagnoses: RA (n = 157), UA (n = 148), SpA (n = 45), and inflammatory OA (n = 23). The results of the sensitivity analyses of the separate diagnoses were similar to the results of the patients combined as presented above (Supplementary Tables 3 and 4, available with the online version of this article).
Intrareader reliability
The intrareader ICC, mean, and median scores are presented in Table 2. Mean scores of MRI features in the MTP joints varied from 0.4 (SD 0.6 SD) for erosions, to 1.7 (SD 2.9) for tenosynovitis. The ICC scores for BME ranged from 0.96–0.98 (95% CI 0.89–0.99 and 0.95–0.99), for synovitis from 0.90–0.98 (95% CI 0.74–0.97 and 0.94–0.99), for tenosynovitis from 0.84–0.97 (95% CI 0.58–0.94 and 0.91–0.99), and for erosions from 0.71–0.92 (95% CI 0.35–0.89 and 0.78–0.97). BA plots indicated that systematic bias was low and are presented in Supplementary Figure 2 (available with the online version of this article).
The intrareader reliability of the MRI features for MCP and wrist joints were similar to the MTP joints (Table 2).
Reliability of change scores
The mean, median, and ICC of change scores after 1 year of followup of 30 early arthritis patients are presented in Table 3. The scores of the individual readers are depicted in Supplementary Table 5 (available in the online version of this article). The change in MRI scores over time in the MTP joints was small for erosions (mean 0.4, SD 0.6) and larger for the inflammatory MRI features (≥ −1.3). The ICC for change scores were ≥ 0.90 for BME, synovitis, and tenosynovitis, and 0.77 (95% CI 0.52–0.89) for erosions. The SDC was ≤ 1 for all MRI features, suggesting a high potential to detect changes. The number of patients with true change by using the SDC as a cutoff was similar to the number of patients where change > 0.5 was measured; then BME revealed change in 37% of patients, synovitis in 67%, tenosynovitis in 47%, and erosions in 17% (Table 3). BA plots indicated that systematic bias was low and are presented in Figure 3.
The same analyses were performed for the MCP and wrist joints, and are presented in Table 3; these results were similar to those of the MTP joints.
DISCUSSION
In RA research, the scoring of MR images is performed according to the RAMRIS. Validation of the RAMRIS as an outcome measure for trials has thus far focused on the hands2,3. In this study we investigated the reliability of the RAMRIS when applied to the MTP joints. Overall, we observed good to excellent intra- and interreader reliability for status and change scores. In particular, ICC for inflammatory features were generally > 0.90.
Previously, the reliability of status scores of BME, synovitis, and erosions as well as change score of erosions in MTP have been published and were found to be excellent13,14. Our study is the first, to our knowledge, to look at the reliability of scoring tenosynovitis at the MTP joints and change scores of inflammatory MRI features as measured by BME, synovitis, and tenosynovitis in an early arthritis setting. To further support our findings, we also analyzed data from the wrist and MCP joints to compare this to data of MTP joints. The current data showed that scoring of MTP joints was equally reliable. Finally, our findings obtained on hand joints are in concordance with previous MRI studies of the hands, which supports the validity of the present results28,29,30.
A pitfall of ICC is that they are sensitive to a lack of variability among sampled subjects25. We found the intrareader reliability of erosions to be moderate for reader 1 (0.71, 95% CI 0.35–0.89), but excellent for reader 2 (0.92, 95% CI 0.78–0.97). The mean score of erosions was low (0.4 ± SD 0.6), which corresponds to a lack of variability among these subjects that could have resulted in a moderate ICC. In addition, for change scores the reliability of erosions was lower than the inflammatory MRI features (0.77, 95% CI 0.52–0.89), but still good25. Also, here the mean change in the score of MRI features was lowest for erosions (0.4 ± SD 0.6) compared to ≥ −1.3 for the other features (Table 3). In addition to ICC, BA plots (and for change scores, SDC) are important to take into consideration as measures of reliability. BA plots visualize the data and illustrate that levels of agreement were acceptable in both cases (Figure 2 and Figure 3), and for change scores the low SDC suggests a good reliability31.
For change scores we selected patients with early arthritis who had high baseline scores and were thus most prone to changes over time, specifically a decrease in inflammation and possibly an increase in erosions. The mean change scores were low for all MRI features and for erosions in particular [at the MTP, the mean change score was 0.4 (SD 0.6)]. This is expected because 29 of the 30 patients received DMARD, inhibiting the occurrence or progression of erosions.
The focus in rheumatology is shifting from established erosive RA to early arthritis and even to patients with clinically suspect arthralgia. Therefore, we included patients with clinically suspect arthralgia and found the scoring of status scores to be reliable. In different stages of disease, MRI-detected lesions may be more or less frequent, which may influence the reliability of scoring32. MRI-detected inflammation was subclinical by definition, because there was no apparent arthritis upon physical examination. As expected, absolute MRI scores for arthralgia patients were lower than those with early arthritis, but the reliability overall was good. This is encouraging for MRI studies in the pre-arthritis phase.
Radiographic studies have shown that erosive lesions occur more commonly in the feet than in the hands, and also in earlier phases of disease8,9. In our results, the scores of the MRI features were higher in the hands than in the MTP joints, especially in the wrist. This is in concordance with a recent study performed in patients with undifferentiated arthritis on the development of RA, where adding MRI of the foot did not improve predictive accuracy compared to MRI of the hand alone33. This was explained by the finding that inflammation in the foot was indeed an early phenomenon, but it almost never occurred without inflammation in the hands.
A strength of our study is that it included a large number of patients from 2 different cohorts, and scoring was performed by numerous readers with considerable experience with the RAMRIS. We studied an unselected group of patients with early arthritis, rather than a specific group of patients that met the stringent inclusion criteria of trials. Reliability studies in this patient population are infrequent, making the present data important for future studies in early arthritis. In sensitivity analyses, we looked at the reliability of scoring in the following diagnoses separately: RA, UA, SpA, and inflammatory OA. This was done for readers 1/2 and readers 3/4 separately, and thus resulted in small numbers of patients, especially in the SpA and inflammatory OA groups (SpA: n = 15 and 30, and OA: n = 12 and 11). For the RA and UA groups the reliability of scoring was overall good; for the latter 2 diagnoses, caution should be taken when interpreting the results.
The aspect discrimination of the OMERACT filter was not addressed in our study and should be the subject for further research11. In addition, whether the measured change is clinically relevant needs to be determined in studies evaluating the minimal clinically important difference. The scores were not timed and thus unfortunately it was not possible to make a statement concerning feasibility.
We applied the RAMRIS that is developed for the wrist and MCP joints to the MTP joints, and for tenosynovitis the commonly used score developed by Haavardsholm, et al2,17. The RAMRIS was recently updated, and now includes joint space narrowing and a slightly modified tenosynovitis score published by Glinatsi, et al10,34. The updated RAMRIS was not yet available at the start of our study and was therefore not used here. We applied the tenosynovitis score of Haavardsholm, et al to the flexor and extensor tendons of the MCP and MTP joints. Although the extensor tendons at the MTP and MCP joints seem to lack a synovial sheath, inflammation around the extensor tendons at the MCP joints have been described in RA35. Even though the characteristics of this inflammation are unclear, it is important to further study and validate the scoring of the inflammation observed around the extensor tendons, which includes assessing its reliability.
According to the RAMRIS method, T2-weighted fat-suppressed or short-tau inversion recovery sequences should be used to assess BME. Previous studies have demonstrated that a contrast-enhanced T1-weighted fat-suppressed sequence has a strong correlation with T2-weighted fat-suppressed sequences19,20,21. In addition, the ESSR Arthritis Subcommittee also recommends the use of contrast-enhanced T1-weighted fat-suppressed sequences for depiction of BME22. We therefore used the contrast-enhanced T1-weighted fat-suppressed sequence because it allowed a shorter scan time and has a higher signal-to-noise ratio. This did not influence the reliability of scoring BME, although we did not strictly follow the RAMRIS protocol for depicting BME.
The MTP images were acquired after gadolinium contrast was given for the acquisition of hand images. The time between contrast administration and imaging of the foot was about 12 min. Previously it was shown that small time variations are not of major importance to measured synovial membrane volumes, because during a 1-h postcontrast followup period, the measured volumes remained almost unchanged36. Therefore it is unlikely that time variations influenced our results.
Scoring of status and change scores of BME, synovitis, tenosynovitis, and erosions of the MTP joints according to the RAMRIS was reliable. This is encouraging for the use of the scoring system also for MTP joints in trials in early phases of RA.
Acknowledgment
The authors acknowledge all those who did the scoring of the MRI scans.
Footnotes
Supported by the European Research Council under the European Union’s Horizon 2020 research and innovation programme (Starting grant, agreement no. 714312) and by the Dutch Arthritis Foundation. The funding source had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
- Accepted for publication September 3, 2019.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.