Abstract
Objective. There is an unmet need for reliable assessment of structural progression in the sacroiliac joints (SIJ) of patients with spondyloarthritis (SpA), but radiography is unreliable and lacks responsiveness. We aimed to develop and validate a new scoring method for structural lesions based on magnetic resonance imaging (MRI), the Spondyloarthritis Research Consortium of Canada (SPARCC) SIJ Structural Score (SSS).
Methods. The SSS method for assessment of structural lesions is based on T1-weighted spin echo MRI, validated lesion definitions, slice selection according to well-defined anatomical principles, and dichotomous scoring (lesion present/absent) of 5 consecutive slices through the cartilaginous portion of the joint. Scoring ranges are fat metaplasia (0–40), erosion (0–40), backfill (0–20), and ankylosis (0–20). We progressively conducted 3 validation exercises with 2–4 readers on baseline, and either 2-year (exercises 1 and 2) or 1-year (exercise 3) scans from 147 patients with SpA assessed blinded to timepoint. Interobserver reliability was assessed by intraclass correlation coefficient (ICC) and smallest detectable change (SDC).
Results. Interobserver reliability for status score was good to excellent for ankylosis (ICC 0.79–0.98), consistently good for fat metaplasia (ICC 0.71–0.78), moderate to good for erosion (ICC 0.58–0.62), and fair to good for backfill (ICC 0.35–0.66). Reliability for change scores was moderate to good for all structural lesions despite the relatively small changes in scores, and was highest for fat metaplasia when both ICC and SDC values were compared.
Conclusion. The new SPARCC MRI SSS method can detect structural changes in the SIJ with acceptable reliability over a 1–2-year timeframe, and should be further validated in patients with SpA.
Radiography of the sacroiliac joints (SIJ) continues to be an important diagnostic tool in patients with spondyloarthritis (SpA). However, it is unreliable for detecting change and lacks responsiveness so it is not used for assessment of potential disease-modifying agents. There is, therefore, an unmet need for imaging tools to assess the potential disease-modifying effects of therapeutic agents early in SpA when disease is still confined to the SIJ1,2,3.
Magnetic resonance imaging (MRI) represents a substantial advance in the field because of its ability to visualize inflammation in soft tissue as well as the subchondral bone, visible as bone marrow edema (BME)4. This is evident on fat-suppressed sequences such as short-tau inversion recovery (STIR). Recent MRI data also show that resolution of BME in subchondral bone marrow may be associated with the development of fat metaplasia on the T1-weighted spin echo (T1WSE) sequence1,5,6. Fat metaplasia is not observed on radiography and the histopathology of this lesion is unknown, but it is frequently observed in the subchondral marrow of SIJ and at spinal locations that are also typical for inflammation, i.e., vertebral corners, adjacent to vertebral endplates, and facet and costovertebral joints7. Moreover, fat metaplasia at vertebral corners has been shown to predict the subsequent development of new syndesmophytes in multivariate analysis1. We have previously hypothesized that the resolution of inflammation in SIJ erosions is followed by the development of a new tissue, which on T1WSE MRI has the same signal intensity as fat metaplasia8. We have called this type of lesion “backfill” because of its appearance in the cavity of the erosion, whereas the term fat metaplasia is used when this lesion is located in the subchondral bone marrow.
There has been limited validation of MRI-based scores for structural lesions in the SIJ, primarily based on cross-sectional data and recording fat metaplasia6,8,9,10,11,12,13,14, but it is unclear whether change in structural progression can be reliably detected. The Outcome Measures in Rheumatology (OMERACT) filter is a well-accepted framework for the validation of new scoring instruments based on assessment of feasibility, truth, and discrimination15. In our report, we describe the development and preliminary validation of a new scoring method for structural lesions in the SIJ according to the OMERACT filter, the Spondyloarthritis Research Consortium of Canada (SPARCC) Sacroiliac Joint Structural Score (SSS). This method assesses a broader spectrum of structural lesions in the SIJ that includes erosion, fat metaplasia, backfill, and ankylosis. Preliminary validation of this scoring method was aimed at demonstrating reliability of detection of structural lesions in the SIJ, at detection of change in structural progression in the SIJ, and whether change could be reliably detected before 2 years, the time frame required before radiographic progression in the spine can be reliably detected16.
MATERIALS AND METHODS
Patients
We assessed MRI scans from patients meeting the modified New York criteria for ankylosing spondylitis recruited to the SPARCC prospective observational cohort17. Patients are evaluated systematically every 6 months according to a standardized protocol that includes clinical and laboratory variables. Imaging variables included SPARCC MRI SIJ inflammation scores18 that are recorded for each patient in the cohort by readers unconnected with our study at baseline, at 3–6 months for patients starting antitumor necrosis factor-α (anti-TNF-α), and annually for all patients as described17. A total of 147 patients had available baseline and 2-year followup scans, and were included in our study. Our study was performed in accordance with the Helsinki Declaration. A written informed consent is obtained from all study participants before inclusion into the observational cohort.
MRI protocol
The SPARCC SSS method was developed based on semicoronal T1WSE sequences of the SIJ. The scan variables were as follows: 15–19 slices, 4-mm slice thickness, 0.4-mm interslice gap, field of view 280–300 mm, repetition time 423–450 ms, echo time 12–13 ms, and matrix 512 × 256 pixels. Although the original scans from all patients included STIR sequences, the STIR sequences were deleted from the set of scans included in this validation process to ensure blinding of the readers with respect to timepoint and change in the distribution of active inflammation. In particular, the substantial reduction in BME that is typically observed in patients receiving anti-TNF-α therapy may unblind readers to time sequence.
Development of the SPARCC SSS method. Lesion definitions
We adopted standardized definitions of structural lesions of the SIJ on MRI, which were developed by the Canada-Denmark MRI Working Group19 and then extended in a subsequent report to include backfill8. For fat metaplasia to be scored in the SSS method, the lesion has to demonstrate homogeneous signal across the lesion that must extend more than 1 cm in depth from the joint surface because our previous work has shown that this enhances reliability (data not shown). Backfill is defined on T1WSE sequences as the complete loss of the iliac or sacral cortical bone at its anticipated location and increased signal that is clearly demarcated from adjacent normal marrow by irregular dark signal reflecting sclerosis at the border of the eroded bone (Figure 1).
We developed a training module that comprised a detailed description of the SSS scoring method with examples, a Digital Imaging and Communication in Medicine (DICOM)-based reference image set of 45 cases with baseline and 2-year scans, and consensus reader scores for these DICOM images. This comprehensive module is aimed at facilitating calibration of non-expert readers and is available at www.carearthritis.com. Bone sclerosis and abnormalities of the synovial cavity are not addressed in the SSS method because of poor reproducibility in previous reading exercises (data not shown).
Scoring methodology
Evaluation of structural lesions in the SIJ proceeds sequentially in the following steps: (1) the transitional slice is identified by scrolling from anterior to posterior through the SIJ and viewing DICOM images depicting semicoronal slices through the joint. The transitional slice is defined as the first slice in the cartilaginous portion that has a visible portion of the ligamentous joint when viewed from anterior to posterior (Figure 2); (2) all timepoints are anatomically matched according to the transitional semicoronal SIJ slice. The link function on the DICOM viewing software allows simultaneous scrolling of anatomically matched images from the transitional slice anteriorly, thereby facilitating detection of change in lesions between different timepoints; (3) five consecutive semicoronal slices are assessed starting from the transitional slice and scrolling anteriorly. This number of slices was chosen because our preliminary work showed that the SIJ cavity together with adjacent bone marrow are still clearly visible at the most anterior slice in virtually all patients (data not shown). Scoring slices posterior to the transitional slice is not appropriate because those slices are predominantly or entirely composed of the ligamentous portion of the joint, which often has irregular bone cortices resembling erosions. Consequently, additional anterior or posterior slices do not provide useful additional information, depict areas that are frequently the hardest to interpret even in normal subjects, and including them in the exercise reduced reader reliability; and (4) the presence/absence of lesions is scored in SIJ quadrants (fat, erosion) or halves (backfill, ankylosis) using a direct online data entry system based on a schematic of the SIJ.
Scoring ranges are fat metaplasia (0–40), erosion (0–40), backfill (0–20), and ankylosis (0–20). The scoring for erosion and fat metaplasia is based on the assessment of SIJ quadrants because those abnormalities can occur on either the iliac and/or sacral sides. Scoring of those lesions is therefore based on all 8 SIJ quadrants. Ankylosis extends from the iliac to the sacral side while backfill fills the joint cavity. Scoring of these lesions is therefore based on SIJ halves. An example of the approach to scoring is provided in Figure 3.
Reading exercises
We conducted 3 formal reading exercises aimed at the assessment of feasibility and reliability. Readings were conducted blinded to patient demographics, time sequence, and treatment according to the following steps: (1) feasibility was first assessed in a pilot exercise of 20 cases randomly selected from the cohort with baseline and 2-year scans scored by 4 readers blinded to timepoint. The average time required to read a pair of scans (baseline, 2 yrs) per case was estimated. Because reliability for baseline as well as 2-year change scores was at least good [intraclass correlation coefficient (ICC) ≥ 0.6] for most features (status/change ICC for fat metaplasia = 0.72/0.68, for erosion = 0.60/0.59, for backfill = 0.86/0.55, for ankylosis = 0.98/0.79), further validation was undertaken after debriefing of discrepant cases and clarification of lesion definitions for scoring purposes; (2) in reading exercise 1, 4 readers scored 45 cases with baseline and 2-year scans randomly selected from the cohort to assess whether interobserver reliability for the 4 readers was consistently evident, particularly for change scores, when assessed blinded to timepoint; (3) in reading exercise 2, 2 of the 4 readers independently scored an additional 102 cases blinded to timepoint with available baseline and 2-year scans to assess reliability for baseline and 2-year change scores across the spectrum of patients in the cohort; and (4) in reading exercise 3, we assessed whether sufficient reliability of change scores could be attained in as short a time frame as 1 year. Baseline and 1-year scans from 40 cases were assessed blinded to timepoint by 3 of the 4 readers. The selection of these cases required that the baseline STIR MRI scan had evidence of BME, which had been scored previously using the SPARCC SIJ score prior to any development work on the SSS method. The rationale for this selection is that most current trials on nonradiographic axial SpA now require a positive MRI of the SIJ according to the presence of BME as an inclusion criterion.
Statistics
We used descriptive statistics to assess the number (percentage) of patients with any change and the mean (SD) change in each of the 4 structural lesion scores over 2 years. Interobserver reliability for baseline, 1-year, and 2-year change scores was assessed using ICC (3,1). A 2-way mixed effects model with patient as a random factor and observer as a fixed factor was used, and the results are given as single measures. The ICC are presented as absolute agreement for the individual reader pairs and for all readers together (overall ICC). An ICC value of < 0.4 was designated fair; ≥ 0.4 but < 0.6 moderate; ≥ 0.6 but < 0.8 good; ≥ 0.8 but < 0.9 very good; and ≥ 0.9 excellent. We also calculated the smallest detectable change (SDC), which provides an absolute measure of agreement using the Bland-Altman 80% levels of agreement20.
RESULTS
Baseline characteristics and descriptive SPARCC SSS data
The majority of patients in all 3 exercises were men with longstanding disease receiving TNF-α inhibitor therapy (Table 1). A minority received only prescribed NSAID therapy and some patients received no prescribed therapy, but took over-the-counter antiinflammatory agents. At baseline, the majority of patients in each of the 3 exercises had at least 1 SIJ quadrant with fat metaplasia, erosion, and backfill in the SIJ, and about half in each exercise had at least 1 SIJ quadrant with ankylosis. Mean SSS erosion score decreased over followup while scores for other structural lesions increased. At the patient level, an increase in SSS fat metaplasia score was observed in a mean of 19.7% and a decrease in 15.6% for the 2 readers who read all 147 available baseline and 2-year scans. Similarly, for SSS erosion score, an increase was observed in a mean of 14.6% patients and a decrease in 37.8%. An increase in SSS backfill was observed in a mean of 23.1% of patients and a decrease in 16.7%. The SSS method detected change in structural lesion score for each lesion as soon as 1 year in the majority of patients assessed in exercise 3.
Feasibility and interobserver reliability
The time required to assess a pair of scans per case for all structural lesions ranged from 5–15 min. The selection of 5 slices from the transitional slice anteriorly allowed the assessment of structural lesions in the bone marrow of both sides of the joint cavity in all patients. Interobserver reliability for status score assessed at baseline was good to excellent for ankylosis, consistently good for fat metaplasia, moderate to good for erosion, and fair to good for backfill (Table 2). Very good to excellent reliability was achieved by some reader pairs.
Interobserver reliability for change scores was moderate to good for all structural lesions and for all exercises with the exception of erosion in exercise 1 despite the relatively small changes in scores that were recorded (Table 3). Moderate to good reliability was evident for scoring 1-year change in all structural lesions and was highest for fat metaplasia when both ICC and SDC values were compared. SDC values were consistently less than 5% of the maximum score for fat metaplasia in all 3 exercises. Backfill was the most difficult lesion to detect reliably with SDC values of between 5–10% of the maximum score for each of the 3 exercises.
DISCUSSION
We have developed and conducted preliminary validation of a scoring method for structural lesions in the SIJ that is based on the same scoring principles used in the SPARCC MRI SIJ inflammation scoring method. The approach to the selection of MRI slices is anatomically defined, the majority of the cartilaginous portion of the joint is assessed on consecutive slices in the semicoronal plane, and scoring is dichotomous (present/absent), which simplifies assessment and improves reliability. Assessment of status and change scores on a pair of scans can be completed in a feasible time frame. Interobserver reliability for status scores is good to excellent for all lesions and can be achieved early in the calibration process. Good reliability can also be achieved for change scores at the 1-year followup.
Scoring with this method begins with the identification of the transitional slice and then proceeds by evaluating 5 consecutive slices ventrally through the cartilaginous portion of the joint. Scoring slices dorsal to the transitional slice is not appropriate because those slices are predominantly or entirely composed of the ligamentous portion of the joint. In addition, scoring slices more ventrally often results in failure to depict the joint clearly. In both cases, these additional ventral or dorsal slices do not provide useful additional information in the context of scoring structural damage, depict areas that are frequently the hardest to interpret even in normal subjects, and including them in the exercise reduced reader reliability (data not shown). A variable that could influence the application of this method is that the comparison of scores between timepoints can be more challenging if the coronal tilt of slices through the SIJ varies between timepoints. This is easily prevented by stipulating appropriate methodological detail in protocols for imaging in clinical trials and research. We recommend the following protocol for defining the tilted coronal plane. After a triplanar series of scout images, a dedicated small field-of-view sagittal scout series of the sacrum is performed. With reference to the midline sagittal image of the sacrum, the tilted coronal sequences are angled parallel to the posterior cortex of the second sacral vertebral body, which also represents the anterior border of the sacral spinal canal at the S2 level.
A defining principle of the SPARCC methodology for scoring inflammatory and structural lesions in SpA is the application of a dichotomous scoring method based on the simple presence or absence of a lesion per slice in an anatomically defined location such as an SIJ quadrant. The pathological abnormalities visible on MRI often include mixed lesions with complex appearances that may hinder scoring approaches based on estimates of percent volume of an anatomical region occupied by the lesion, especially in the SIJ. Several lesions may be evident concomitantly even in the same SIJ quadrant, and reliable assessment of the volume of the SIJ quadrant occupied by a specific lesion in a joint as morphologically complex as the SIJ may be exceptionally difficult. For example, a lesion visible only in anterior slices will occupy a greater percentage of an SIJ quadrant or the iliac/sacral portion of the joint as compared to the same-sized lesion present only in dorsal slices. For lesions extending across several slices from ventral to dorsal, scoring based on the percent volume occupied by the lesion requires that the reader attempt to mentally compile a 3-D estimate based on all slices assessed.
The SDC was calculated based on 80% limits of agreement20 and resulted in values that are higher than the mean change scores observed in all 3 exercises. This indicates that there is a risk that patients may be misclassified as having progression when in fact there is measurement error. In our dataset, however, it is also important to note that change in structural lesions may occur in both directions, i.e., increase or decrease, and the mean score reflects a summation at the group level. This limits interpretation and conclusion about the potential for misclassification of measurement error based on evaluation of group mean data.
Several scoring methods have been developed for quantification of fat metaplasia, and the reliability for assessment of status scores was reported as very good although none as yet have been validated to show that change scores can be reliably detected6,9,10,11,14. One report assessed fat lesions dichotomously (present/absent) according to SIJ quadrants and a 0–8 scoring range based on a global evaluation of the SIJ rather than scoring of SIJ quadrants in individual slices6. Development of new lesions correlated with resolution of inflammation and a significant difference was noted between patients receiving etanercept versus those taking sulfasalazine as soon as 24 weeks. A second method grades fat metaplasia in both cartilaginous and ligamentous compartments and grades severity according to the extent of subcortical bone affected (0 = no fat, 1 = < 25%, 2 = 25–50%, 3 = > 50%)14. A weighting of 1 is added for fat metaplasia extending ≥ 1 cm beneath the joint surface. Fat metaplasia may be distributed in a very heterogeneous manner in the iliac and sacral bones, and is typically located adjacent to the subchondral bone where it may form a linear band of variably increased signal on the T1-weighted scan21. Our preliminary development work showed that reliable detection of fat metaplasia can be enhanced by scoring lesions that extend at least 1 cm in depth from the joint surface without affecting responsiveness.
Erosion in the SIJ has been quantified according to a grading scheme based on the number of erosions per SIJ (1 = 1–2, 2 = 3–5, 3 = > 5 erosions per SIJ)6. Reliability for status score was very good, but data for change scores was not reported. In a second method, erosion was graded in both the cartilaginous and ligamentous compartments, and severity graded according to the extent of subcortical bone affected (0 = no erosion, 1 = < 25%, 2 = 25–50%, 3 = > 50%)14. Reliability for status score was good, but data for change score was not reported. A limitation of scoring based on the number of erosions per SIJ is that it is not unusual to observe erosion affecting the entire vertical height of the iliac or sacral bone on a coronal scan so that one cannot consider these as discrete lesions. In addition, this scheme appears to preclude the possibility that there may be a substantial reduction in the size of an erosion without affecting the number of erosions.
The scoring methodology we have developed suggests that reliable detection of change in structural lesions in the SIJ may be possible even after 1 year. Ankylosis is defined by the presence of bright marrow signal traversing the joint space between the iliac and sacral bones and is therefore easier to discern reliably on T1-weighted MRI than erosion, which requires loss of dark signal, signifying breach of cortical bone. Nevertheless, reliability for detecting change in erosion and ankylosis was comparable, suggesting that application of validated and standardized lesion definitions and calibration based on DICOM images illustrating the spectrum of abnormalities and changes over time can be an effective mode of knowledge transfer. Backfill presents a more heterogeneous MRI appearance with increased signal on the T1-weighted scan that is clearly demarcated from adjacent normal marrow by irregular dark signal reflecting sclerosis at the border of the eroded bone. This is more difficult to define in a standardized manner and is the likely reason why more extensive calibration is required to achieve more reliable detection of this lesion. We have shown that resolution of inflammation and reduction in erosion is significantly associated with the development of backfill22, although further validation of this lesion using computed tomography would be helpful.
We have developed and validated a scoring methodology for the assessment of structural lesions on MRI in the SIJ of patients with SpA that is based on the same principles as the widely used SPARCC MRI SIJ inflammation score. These include selection of MRI slices according to well-defined anatomical principles, the scoring of lesions on a dichotomous basis as being present/absent on consecutive, oblique coronal slices through the cartilaginous portion of the joint, and calibration of readers using standardized definitions and DICOM-based reference cases. Further validation should now be undertaken aimed at assessment of responsiveness and the pathophysiological and prognostic significance of these lesions observed on MRI.
Footnotes
-
Supported by a research fellowship grant from the SPARCC and a grant from the Danish Council for Independent Research in the Medical Sciences to Dr. Pedersen. Dr. Maksymowych is a Medical Scientist of Alberta Innovates Health Solutions.
- Accepted for publication September 2, 2014.