Abstract
Objective. To develop and validate an enthesitis magnetic resonance imaging (MRI) scoring system for spondyloarthritis/psoriatic arthritis, using the heel as model.
Methods. Consensus definitions of key pathologies and 3 heel enthesitis multireader scoring exercises were done, separated by discussion, training, and calibration.
Results. Definitions for bone and soft tissue pathologies were agreed. In the final exercise, median pairwise single-measures intraclass correlation coefficients (ICC; patient-level) for entheseal inflammation status/change scores were 0.83/0.82 for all readers. For radiologists and selected rheumatologists, ICC were 0.91/0.84 and quadratic-weighted κ (lesion-level) 0.57–0.91/0.45–0.81.
Conclusion. The proposed definitions and Heel Enthesitis Scoring System (HEMRIS) are reliable among trained readers and promising for clinical trials.
Enthesitis — inflammation at insertion sites of ligaments, fasciae, tendons, and joint capsules to bone — is a central feature of spondyloarthritis (SpA), including psoriatic arthritis (PsA). Sensitive and objective assessment of enthesitis is important in SpA clinical trials. Conventional clinical methods have limited reliability, validity, and sensitivity1,2,3. Magnetic resonance imaging (MRI) is a sensitive method for detecting enthesitis in peripheral SpA and the only method allowing detection of perientheseal osteitis4,5,6. MRI studies have demonstrated decreased entheseal inflammation after anti-tumor necrosis factor (TNF) therapy, but no validated MRI scoring systems exist for evaluating enthesitis in clinical trials7. Our aim was to create consensus-based MRI definitions of key enthesitis pathologies and through multireader exercises to develop and validate an MRI score for assessing enthesitis in patients with SpA, focusing on the heel region.
MATERIALS AND METHODS
The Outcome Measures in Rheumatology (OMERACT) MRI in Arthritis Working Group initially performed a systematic literature review (SLR) of studies with MRI being used for assessment of enthesitis8. Based on this SLR, MRI sequences for optimal visualization of enthesitis were identified, and MRI definitions of key enthesitis pathologies were decided by consensus among group members through meetings/e-mails. The heel region (insertions of Achilles tendon and plantar fascia) was chosen for initial testing because of its frequent involvement. Three multireader exercises, with consensus discussion and calibration in-between, were then performed. A graphical data entry schematic (Figure 1) was created, and subsequently a Web-based interface that simultaneously displayed DICOM (Digital Imaging and Communications in Medicine) images and the data entry schematic (Figure 2). In exercise 1, performed to identify challenges and pitfalls, sagittal T1-weighted (T1W) and sagittal and axial T2W fat–suppressed (T2wFS) MR images of 10 ankles [4 inflammatory enthesitis (peripheral SpA), 4 mechanical enthesitis, and 2 normal controls] were scored by 15 readers from 10 countries, with varying expertise in ankle MRI, for enthesitis at Achilles tendon and plantar fascia insertions. This was followed by a Web-based calibration exercise leading to minor score sheet modifications. In exercise 2, 16 ankle MRI [8 inflammatory enthesitis (peripheral SpA), 3 mechanical enthesitis, and 5 normal controls; MRI sequences as above] were scored by 16 readers. In exercise 3, ankle MRI (sagittal T2wFS only) of 21 patients with SpA from a clinical trial, obtained before and after anti-TNF therapy, were scored for inflammatory pathologies by 10 readers, blinded to chronological order. For assessing the reliability scores among the more experienced readers, agreement was analyzed separately between the participating radiologists and the 3 rheumatologists with best overall intraclass correlation coefficient (ICC) for inflammatory pathologies in exercise 2.
Statistical analysis
Exercise 1 was mainly used for qualitative training and understanding principles and pitfalls, while for exercises 2–3 reliability statistics [pairwise single-measures and average-measures ICC by absolute agreement for sum scores (patient level) and squared weights Cohen’s κ for individual component scores (lesion level)] were calculated. In exercise 3, the standardized response mean (SRM) was calculated.
RESULTS
Definitions of key pathologies
Key entheseal pathologies were selected and their definitions agreed upon by consensus within the OMERACT MRI in Inflammatory Arthritis Working Group (Table 1), based on knowledge from an SLR8, and published OMERACT MRI definitions for comparable conditions9,10,11. The selected pathologies were intratendon hypersignal (entheseal tendonitis), peritendon hypersignal (entheseal peritendonitis), bone marrow edema (BME; entheseal osteitis), bursitis, tendon thickening, enthesophyte, entheseal bone erosion, and intratendon hypersignal on T1W sequence.
MRI sequences and planes
For evaluating inflammatory pathologies, it was agreed to include a fluid-sensitive sequence [short-tau inversion recovery (STIR) or T2wFS], and/or a fat-suppressed T1W sequence following intravenous gadolinium (Gd) injection (Figure 3). A T1W sequence prior to contrast injection (T1-pre-Gd) was considered helpful in determining the exact localization of inflammatory pathologies because of its high anatomical resolution and is essential for assessment of structural pathologies.
Scoring system
It was decided to score all assessed pathologies on a semiquantitative scale of 0–3 (none/mild/moderate/severe), following the principles from the RAMRIS (rheumatoid arthritis magnetic resonance imaging) and PsAMRIS (psoriatic arthritis magnetic resonance imaging) systems9,10,11, and to create a total entheseal inflammation score by summation of scores of all inflammatory variables (intratendon hypersignal on T2w/STIR sequences, peritendon hypersignal, BME, and bursitis). Similarly, a total entheseal structural damage score by summation of structural scores (enthesophyte, bone erosion, tendon thickening) was developed. Intratendon hypersignal on T1W sequences was not included in sum scores. In exercises described in the present paper, scoring of entheses of the heel region was chosen, i.e., at calcaneal insertions of the Achilles tendon and plantar fascia, respectively.
Exercise 1
Exercises 1 and 2 included single-point images of the heel region, which were scored for the selected predefined pathologies. Exercise 1 was used for initial learning, calibration, and identification of pitfalls. Mean pairwise interreader single-measure ICC for inflammatory and structural variables, done without calibration, were 0.40 and 0.41, respectively.
Exercise 2
In exercise 2, agreement between reader pairs varied from poor to very good for various lesion types and their sum scores (Table 2). When limiting the analyses to 3 participating musculoskeletal radiologists and 3 rheumatologists with best ICC for inflammatory pathologies in exercise 2, reliability improved to moderate to very good. For this subset of readers, median single-measure ICC for total inflammation scores was 0.85, while for total structural damage scores was 0.68. Median κ for different inflammatory pathologies varied from 0.60 to 0.89, and for individual structural pathologies from 0.41 to 0.78. Average-measure ICC based on 2 readers among the preselected 6 readers (median 0.92 for total inflammatory score, 0.81 for total damage scores) were better than the single-measure ICC.
Exercise 3
This exercise included 2 timepoint images, in which inflammatory pathologies were scored. Mean pairwise interreader ICC and lesion-wise κ agreement demonstrated moderate to good reliability when all readers were considered (Table 3). The subset of readers (3 rheumatologists with best agreement for inflammatory measures in exercise 2 and the participating radiologist in exercise 3) demonstrated good to very good reliability, both for baseline scores and for change in scores (Table 2). The median baseline single-measures ICC for total inflammation was 0.91, while it was 0.84 for change in score. Median average-measure ICC based on 2 readers [status: 0.95 (range 0.95–0.97), change: 0.92 (0.89–0.96)] were higher than single-measure ICC. Using 3 readers demonstrated numerically higher average-measure ICC [status: median 0.97 (0.97–0.97), change 0.94 (0.94–0.95)].
The Heel Enthesitis Scoring System (HEMRIS) showed moderate responsiveness, with SRM of 0.70 (95% CI 0.38–1.05) for all readers in exercise 3.
DISCUSSION
To the best of our knowledge, our study is the first international consensus effort toward development of a comprehensive MRI scoring system, combined with MRI definitions and reader rules, for enthesitis in patients with SpA. The work was informed by an SLR8, which clarified knowledge gaps and the need for development of a validated MRI enthesitis scoring system to be used as outcome measure in clinical trials. Enthesitis, often located at the heels, is a typical feature of SpA and is easily accessible for MRI12. Further, enthesitis in SpA may show changes both in inflammation (such as BME and perientheseal inflammation) and damage (such as erosion and new bone formation)13,14. Thus, both inflammatory and structural MRI findings were considered relevant to include in the scoring system. A series of multireader scoring exercises focused on the heel region, using an intuitive Web-based data entry and image display platform. The preliminary OMERACT-HEMRIS showed good interreader agreement for status scores and for change over time in inflammatory measures. Considering that baseline heel enthesitis was not mandatory in exercise 3, the moderate SRM (0.70) supports that responsiveness of the HEMRIS score would likely be good in trials with baseline enthesitis as an inclusion criterion. Thus, HEMRIS appears promising for further validation and future use in randomized controlled trials.
The strengths of this initiative include taking an SLR as starting point to clarify unmet need, the involvement of experienced MRI researchers in the development of consensus-based definitions and scoring systems, and the participation of multiple readers with both radiological and rheumatological backgrounds in interactive Web-based exercises with standardized image display and scoring module. Limitations include varying experience and backgrounds of readers in the exercises, which needs to be taken into consideration when interpreting the results. This was addressed by subanalysis of scores of a subset of experienced readers, who had showed high scoring proficiency in previous exercises. Longitudinal studies incorporating T1W images are needed for assessment of the sensitivity to change of structural variables. Future developments should also include an MRI enthesitis reference image atlas, and image sets for training and calibration. The definitions and scoring principle may be applicable to other entheses. Thus, validation of the definitions and scoring system in other anatomical regions are also suggested.
The heel enthesitis MRI score appears to be particularly reliable if the mean score of 2 readers (compared to 1) is used in the final study analysis; the average-measure ICC for 2 readers were markedly higher (0.92–0.95 for inflammation total status/change score in last exercise) than single-measure ICC. This will be relevant in real-life clinical trials where 2 independent readers generally score images.
Increasing the number of novel therapeutic options in SpA and PsA increases the potential utility of an objective and reproducible enthesitis outcome measure. The proposed OMERACT MRI heel enthesitis scoring system (HEMRIS) is a promising tool for further refinement and validation through the OMERACT filter and for future use in clinical trials15,16.
Acknowledgment
We thank the Canadian Research and Education (CaRE) Arthritis organization (www.carearthritis.com) for help with setting up online meetings and exercises and development of the Web-based scoring interface.
Footnotes
PGC is supported in part by the UK National Institute for Health Research (NIHR) Leeds Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR, or the UK Department of Health.
- Accepted for publication January 9, 2019.