Abstract
Objective. To develop a whole-body magnetic resonance imaging (MRI) scoring system for peripheral arthritis and enthesitis.
Methods. After consensus on definitions/locations of MRI pathologies, 4 multireader exercises were performed. Eighty-three joints were scored 0–3 separately for synovitis and osteitis, and 33 entheses 0–3 separately for soft tissue inflammation and osteitis.
Results. In the last exercise, reliability was moderate-good for musculoskeletal radiologists and rheumatologists with previously demonstrated good scoring proficiency. Median pairwise single-measure/average-measure ICC were 0.67/0.80 for status scores and 0.69/0.82 for change scores; κ ranged 0.35–0.77.
Conclusion. Whole-body MRI scoring of peripheral arthritis and enthesitis is reliable, which encourages further testing and refinement in clinical trials.
Magnetic resonance imaging (MRI) allows objective assessment of inflammation in peripheral joints and entheses1,2,3,4,5,6,7. MRI scoring systems have until now focused on assessing parts of the musculoskeletal system in detail, e.g., the Rheumatoid Arthritis MRI Scoring System (RAMRIS), which is applied to the wrist and metacarpophalangeal joints and adjacent tendon sheaths8,9,10. The interest in a whole-body MRI (WBMRI) approach is growing because modern MRI scanners permit whole-body scanning within an acceptable time frame (< 1 h), and future improvements in MRI hardware and pulse sequences are expected to improve scan time and image resolution further.
WBMRI of patients with inflammatory arthritis has mainly been investigated in small cross-sectional2,5,6,7,11 or longitudinal studies3,4,12. To our knowledge, 2 randomized, controlled trials have used WBMRI as an outcome measure, applying different assessment systems3,4. To increase homogeneity, validity, and across-study comparability of WBMRI as outcome measure, the Outcome Measures in Rheumatology (OMERACT) MRI in the Arthritis Working Group decided to develop a scoring system for inflammation of peripheral joints and entheses for use in future phase II/III studies, which aim to objectively document the effect of an intervention on the inflammatory load in peripheral joints and entheses.
The objective was to develop an MRI Whole-Body Score for Inflammation in Peripheral Joints and Entheses in Inflammatory Arthritis (MRI-WIPE) and to investigate its feasibility and reliability.
MATERIALS AND METHODS
Development of the scoring system through iterative multireader scoring exercises
In 2016, the OMERACT MRI in Arthritis Working Group decided on inflammation in peripheral joints and entheses as the primary focus for WBMRI development, and then agreed on consensus MRI definitions for arthritis and enthesitis, selected anatomical locations for assessment, and a core set of MRI sequences and imaging planes for the different regions, and proposed a preliminary scoring system. It was decided to test and further develop the system by iterative multireader exercises13,14,15.
In 2017–2018, 4 (3 cross-sectional and 1 longitudinal) Web-based multireader exercises were performed, separated by online training and calibration meetings. Schematics for recording the presence of lesions and their severity were drawn (SK and MØ; Figure 1). Subsequently, courtesy of CaRE Arthritis, a Web-based schematic data entry interface was created (JP) and used together with a DICOM (Digital Imaging and Communication in Medicine) image viewer (Figure 2) to conduct entirely Web-based scoring exercises. In Exercise 1, 9 readers (1 radiologist, 8 rheumatologists) tested a draft scoring system in 2 patients with axial spondyloarthritis (axSpA). Results were discussed, and the system was slightly modified. In Exercise 2, 14 readers (3 radiologists, 11 rheumatologists) assessed 5 patients with axSpA. Discrepant cases and potential difficulties in applying the scoring system were discussed online to obtain consensus, train inexperienced readers, and identify potential pitfalls.
Data entry schematics and scoring ranges. Osteitis of the sternoclavicular joint is assessed separately for sternum and clavicle. Osteitis of the manubriosternal joint is assessed separately for manubrium and body of sternum. Osteitis of the hip joint is assessed separately for acetabulum and femur. Osteitis of the knee joint is assessed separately for lateral femur, medial femur, lateral tibia, medial tibia, and patella. Osteitis of the pubic symphysis is assessed separately for left and right pubic bone. OST: osteitis; SYN: synovitis; STI: soft tissue inflammation. Shoulder/ACW (anterior chest wall): ACJ: acromioclavicular joint; SCJ: sternoclavicular joint; SST: supraspinatus tendon; CS: costosternal joint; MSJ: manubriosternal joint; Should: glenohumeral joint. Hands: DRU: distal radioulnar joint; RC: radiocarpal joint; IC-CMC: intercarpal and carpometacarpal joints; MCP: metacarpophalangeal joint; PIP: proximal interphalangeal; DIP: distal interphalangeal. Pelvis: PSIS: posterior superior iliac spine; Iliac C: iliac crest; ASIS: anterior superior iliac spine; G troch: greater trochanter; Isch t: ischial tuberosity; Symph: pubic symphysis. Knees: QFTP: quadriceps femoris tendon insertion into patella; PTP: patellar tendon insertion into patella; PTTT: patellar tendon insertion into tibial tuberosity; MFC: medial femoral condyle; LFC: lateral femoral condyle; F-L: femur-lateral; F-M: femur-medial; T-L: tibia-lateral; T-M: tibia-medial. Feet: ACH: Achilles tendon; PLF: plantar fascia; PTC: posterior talocalcaneal joint; Talocr: talocrural joint; TCN-CC: talocalcaneonavicular and calcaneocuboid joints; T-TMT: tarsal and tarsometatarsal joints; MTP: metatarsophalangeal.
Web-based DICOM (Digital Imaging and Communication in Medicine) image viewer (provided courtesy of CaRE Arthritis at www.carearthritis.com). Short-tau inversion recovery images of the left shoulder region from the same patient at 2 timepoints (left side and middle) and the corresponding completed data entry schematics (right). White arrows: synovitis (score 3, severe) and osteitis (score 1, mild) of the left glenohumeral joint as assessed on the magnetic resonance images and entered in the corresponding data entry schematic.
In Exercise 3, MRI of 8 patients [4 rheumatoid arthritis (RA), 4 psoriatic arthritis (PsA)] were scored by 14 readers (4 radiologists, 10 rheumatologists). Because of widely variable agreement (minimal-good) between reader pairs, 2 online meetings were held to improve calibration before proceeding to Exercise 4, in which MRI at 2 timepoints of 6 patients with axSpA who started tumor necrosis factor (TNF) inhibitor treatment were assessed by 10 readers (3 radiologists, 7 rheumatologists) blinded for chronology. In all exercises, readers were aware of the patient groups involved (SpA or RA), but not the diagnosis of individual cases.
Reader instructions containing definitions and image examples of normal findings (e.g., blood vessels) that could be mistaken for inflammation, and many examples of lesions with different grading were made available at www.copecare.dk and www.carearthritis.com. While Exercises 1 and 2 were used solely for qualitative training and understanding principles and pitfalls, for Exercises 3 and 4, reliability statistics were calculated (pairwise single measures and average measures intraclass correlation coefficients (ICC) by absolute agreement for sum scores and squared weights Cohen’s κ for individual scores).
Approval was obtained from the Regional Committee on Health Research Ethics, Region Hovedstaden, Denmark (H-1-2013-118), and patients provided written informed consent.
Scoring methodology
Inflammation in joints (arthritis) and at entheses (enthesitis) are both assessed separately for soft tissues (synovitis at joints, soft tissue inflammation at entheses) and bone (osteitis), see Østergaard, et al13 for exact MRI definitions.
Preferably, synovitis and soft tissue inflammation are assessed on T1-post-Gd images and osteitis on short-tau inversion recovery (STIR)/T2-weighted fat-sat (T2FS) images. But if STIR/T2FS is the only method available, synovitis and soft tissue inflammation can be assessed based on it. Each component is scored on a semiquantitative scale of 0–3 (none/mild/moderate/severe), following the principles from the RAMRIS and PsAMRIS systems8,16. In total, 83 peripheral joints and 33 entheses are assessed. The MRI-WIPE score is derived by adding all scores together; the total range is 0–738 (joints 0–537; entheses 0–201; Figure 2 and Appendix 1).
RESULTS
Readers from 10 different countries across the globe participated. Exercises 1 and 2 were used only for initial learning, calibration, and identification of pitfalls. In Exercise 3, agreement between readers varied from poor to good for the 4 lesion types and their sum scores (Table 1). Reliability varied between reader pairs depending on reader experience. When limiting the analysis to the 4 musculoskeletal radiologists, reliability improved to moderate-good.
Interreader reliability.
The same pattern was observed in Exercise 4, where reliability was poor-good among all readers, but when restricted to the 3 musculoskeletal radiologists and 3 rheumatologists with the better reliability in the previous exercise, reliability was moderate-good. Thus, among the more trained readers, grading seemed reliable. MRI-WIPE reading time for 1 MRI was not measured but estimated to be ≤ 60 min. Responsiveness of the MRI-WIPE score was good during TNF inhibitor treatment (mean change score −6.3, SD 6.5, and standardized response mean 1.0). Average-measure ICC based on 2 readers (status 0.80, change 0.82) were higher than single-measure ICC (status 0.67, change 0.69; Table 1). Using 3 readers, average-measure ICC were higher (status 0.86, change 0.86).
DISCUSSION
Definitions of key MRI pathologies and a scoring system (MRI-WIPE, MRI Whole-Body Score for Inflammation in Peripheral Joints and Entheses in Inflammatory Arthritis) were agreed upon by consensus in the OMERACT MRI in Arthritis Working Group. The scoring system was developed in analogy with RAMRIS/PsAMRIS scoring systems but allows assessment of multiple peripheral joints and entheses and is not limited to a specific diagnosis in its current form. In small cross-sectional and longitudinal reading exercises, the system had moderate-good reliability for status scores and change scores, when limiting the analysis to readers who were musculoskeletal radiologists or who had shown good proficiency of scoring in agreement with most readers in the previous exercises. Potentially, WBMRI could provide a high between-group discrimination in randomized controlled trials4. Thus, the scoring system appears promising for further validation and future use in randomized controlled trials.
A scan time of about 45 min for peripheral joints and entheses, and about 1 h if axial joints were included, was acceptable to the included patients. Thus, the approach was feasible, although no formal survey of patient satisfaction or discomfort was undertaken.
Subsequent steps may include tailoring/analyzing different joint combinations for different diseases (i.e., a modular approach, where only a selection of areas is imaged and scored, guided by the key questions in individual studies), because diseases such as RA, PsA, and axSpA have different patterns of joint and enthesis involvement. Analyzing different weighting of components, as recently attempted with the RAMRIS system17, e.g., by putting less weight on small joints, may also be considered. Currently, WBMRI image quality is lowest in small joints because of their size and limited image resolution (slice thickness 3–5 mm), but new MRI units and sequence types can provide better resolution.
Not all readers reached the same level of reliability, but several readers’ experiences in reading certain areas were also minimal, and as expected this could not be resolved by a few training exercises. Because of the complex anatomy and many regions to score, it is essential to use appropriate equipment, i.e., 1–2 large high-resolution monitors, in an appropriately lit room, where images of the needed number of timepoints are visualized in an appropriate size without zooming. An online training and calibration module, potentially with a final test of the reader’s proficiency compared to expert readers, is a possibility. Investigating alternative MRI sequences or scanning protocols may also be an option.
Rather few cases were included in the exercises, but for the purposes of development, it was considered more important to understand and discuss potential discrepancies and try to calibrate readers. Higher patient numbers would have increased the certainty of the calculated reliability measures.
The MRI-WIPE score appears to be particularly reliable if the average score of 2 or 3 readers is used in the final analysis of a study, compared to scores based on only 1 reader, because the average measure ICC for 2 or 3 readers were substantially higher than single-measure ICC. With 3 readers, average measure ICC for status scores and for change scores were both 0.86.
The MRI-WIPE score is promising, because scoring was reliable between readers with previous good scoring proficiency. The system needs further validation in larger, longitudinal studies, but in its current form it could be of interest in trials striving for global measures of inflammation in peripheral joints and entheses.
Acknowledgment
Thanks to CaRE Arthritis (www.carearthritis.com) for software development of the Web-based scoring interface, use of the CaRE Web-based DICOM viewer, and for help in organizing the WebEx online meetings.
APPENDIX 1.
Further details on the scoring methodology, and list of sites assessed. OMERACT: Outcome Measures in Rheumatology; MRI: magnetic resonance imaging.
Footnotes
SK received research grants from The Danish Rheumatism Association and Rigshospitalet. PGC is supported in part by the NIHR Leeds Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the UK Department of Health.
- Accepted for publication January 24, 2019.