Abstract
Objective. Assessment of structural damage of sacroiliac joints (SIJ) in patients with axial spondyloarthritides (axSpA) has been discussed as a useful outcome measure in clinical trials. The aim of our study was to evaluate different magnetic resonance imaging (MRI) scoring methods and pulse sequences with a focus on fatty lesions and bony erosions.
Methods. Seventy-five patients with the diagnosis of axSpA underwent MRI at 3 timepoints as part of the ESTHER trial, which compared 2 groups of patients treated with etanercept or sulfasalazine. Two MRI sequences [unenhanced T1-weighted (T1w) turbo spin-echo (TSE) and unenhanced T1w opposed-phase gradient-echo sequences (opGRE)] and 2 different scoring systems (simple and comprehensive Berlin method) were used for the evaluation of fatty lesions and erosions of the SIJ. Differences between techniques and methods were evaluated by intraclass correlation coefficients (ICC) and standardized response means (SRM).
Results. Applying the simple Berlin method, mean fatty lesion scores for etanercept-treated patients were 4.59 and 5.19 at baseline and Week 48, respectively, while the comprehensive Berlin method revealed mean fatty lesion scores of 6.59 and 7.64, respectively. Corresponding SRM were 0.59 and 0.86 for simple and comprehensive methods, respectively, while ICC dropped from 0.76–0.77 to 0.59–0.62. Scoring of erosions on T1w opGRE images resulted in a higher interreader agreement (ICC of 0.65) compared to T1w TSE sequences (ICC of 0.18).
Conclusion. Better characterization of fatty lesion changes within 1 year was achieved by the comprehensive Berlin scoring method; however, more reader variation has to be taken into account. The delineation of erosions is markedly improved when using T1w opGRE pulse sequences.
- SACROILIAC JOINTS
- MAGNETIC RESONANCE IMAGING
- SACROILIITIS
- AXIAL SPONDYLOARTHRITIS
- ANKYLOSING SPONDYLITIS
- SCORING SYSTEM
Axial spondyloarthritides (axSpA) are a group of various rheumatic diseases with an overall prevalence of 1.9%, of which ankylosing spondylitis is the most important1,2,3. Recently, classification criteria of axSpA have been revised by the Assessment of SpondyloArthritis international Society (ASAS)4. These criteria incorporate (besides HLA-B27 testing) imaging of the sacroiliac joints (SIJ) as 1 of the key features of axSpA. While active inflammation is detected by magnetic resonance imaging (MRI), conventional radiographs are still used to characterize structural damage such as erosions, sclerosis, or ankylosis4,5. However, the progression of these structural osseous changes is very slow, indicating that conventional radiographs are useful neither to confirm the diagnosis of early axSpA nor as an outcome measure in clinical trials. Structural damage lesions of the SIJ on MRI have previously been defined by ASAS experts6, but there seems to be uncertainty about the reliability of their detection on MRI7,8,9,10. Erosions of the SIJ have been shown to be of high value for the diagnosis of axSpA11. The combined detection of erosions and periarticular osteitis seems to increase the sensitivity of MRI to some extent12,13.
Another focus of particular scientific interest in axSpA is the transition of inflammatory lesions into fatty lesions. Fatty lesions are not detectable on conventional radiographs or computed tomography, whereas MRI may even detect small areas of fat deposition14,15. However, it remains unclear which role fatty lesions play in the context of disease chronicity because it has been considered part of the reparative processes16,17, while a current study suggests a positive correlation of fatty lesions and subsequent new bone formation18.
Altogether, reliable MRI sequences and suitable scoring systems for the detection and quantification of erosions and fatty lesions are warranted to reliably evaluate structural osseous changes in the course of disease or when evaluating new pharmaceutical agents; initial investigations are promising19. The aim of our study was to weigh 2 different MRI scoring methods and pulse sequences with a focus on fatty lesions and bony erosions.
MATERIALS AND METHODS
Patients
Our study is part of a prospective, randomized multicenter trial that was approved by an independent ethics committee. All patients included had active axSpA with a symptom duration of ≤ 5 years and were randomly assigned to an etanercept or sulfasalazine (SSZ) treatment group20. The diagnosis of axSpA was made based on the presence of chronic low back pain for ≥ 3 months and symptom onset at < 45 years of age. All patients had to show active inflammation (osteitis/bone marrow edema) on whole-body MRI, either on SIJ or the spine, plus 3 established clinical criteria according to the ASAS classification4,6,20. A detailed description of inclusion and exclusion criteria has been published20.
All patients had a whole-body MRI examination at 3 timepoints (A: baseline; B: Week 24; C: Week 48). Altogether, 65 patients (etanercept group: n = 35; SSZ group: n = 30) completed the study; dropouts have been described16,20.
MRI examination
All patients underwent a whole-body MRI examination on a 1.5 Tesla MR scanner (Avanto TIM) according to an established examination protocol20,21,22. The examination was performed in a supine position using a whole-body surface coil system. For our study, unenhanced T1-weighted (T1w) turbo spin-echo (TSE) and opposed-phase gradient-echo (opGRE) sequences of the SIJ were analyzed. Both sequences were acquired in oblique coronal orientation along the long axis of the sacral bone with a matrix size of 512 × 512 pixels and a slice thickness of 3 mm. T1w TSE sequences were acquired with a field of view (FOV) of 310 mm, a repetition time (TR) of 790 ms, an echo time (TE) of 19 ms, and a flip angle of 150°; opGRE sequences were performed with an FOV of 260 mm, a TR of 180 ms, a TE of 7.5 ms, and a flip angle of 90°. T1w opGRE sequences were acquired in a subgroup of 37 patients at Week 48 only.
Definition of MRI lesions
Fatty lesions and erosions were analyzed in the scope of this report. Fatty lesions were defined according to the ASAS definition6. Briefly, they are characterized by an increased signal on unenhanced T1w MR images in a paraarticular location within the bone marrow.
Erosions were scored on T1w TSE and opGRE images of the SIJ. According to ASAS definitions, erosions are of low signal intensity on T1w images and defined as bony defects at the joint margin6. As an extension of this ASAS definition, we determined that these bony defects had to be located at the subchondral plate of the joint, with a clearly visible cortical break on at least 2 adjacent slices. In our study, signal changes of surrounding bone marrow (e.g., fatty marrow changes or bony sclerosis) are not part of the definition of erosions. Further, as mentioned in the ASAS definition, confluence of erosions may be seen as pseudowidening of the SIJ6.
Fatty lesions
Fatty lesions were evaluated on T1w TSE images under application of 2 different scoring systems designated as the simple (SBM) and comprehensive Berlin scoring methods (CBM). The SBM evaluates presence or absence of fatty lesions (0: no fatty lesions present; 1: fatty lesions present), while the CBM quantifies fatty lesions on a 0–3 scale (0: no fatty lesions, 1: ≤ 33%, 2: 33% to 66%, 3: > 66% of the subchondral bone area in the respective quadrant). The sum score was calculated by addition of the 8 quadrant scores, ranging either from 0–8 for SBM or from 0–24 for CBM23,24.
Erosions
The detection of erosions on T1w TSE and opGRE sequences was compared under application of the CBM, quantifying the number of erosions on a 0–3 scale (0: no erosions, 1: ≤ 33%, 2: 33% to 66%, 3: > 66% of the bony joint surface in the respective quadrant). For both MRI sequences, the sum score was calculated by addition of all quadrants, resulting in a patient sum score ranging from 0–24.
MRI scoring
Scoring was performed with the help of the open-source DICOM viewer software OsiriX (Pixmeo). For this analysis, T1w TSE and opGRE sequences of the SIJ were extracted from the whole-body MRI examination and were recoded and archived separately to ensure that readers cannot correlate their findings to other sequences. Reading was performed by 2 radiologists with 5 years of experience in musculoskeletal imaging (MK and LB), following written instructions describing the different scoring systems. At first, a training session was performed, comprising consensus discussions of typical findings for erosions and fatty lesions, as well as typical confounders such as anatomical variants, sclerosis, and indirect signs of osteitis (signal loss in T1w sequences). As a next step, a training set of 15 MRI examinations (not part of the study population) was scored independently by both readers, and results were reviewed and discussed (by KGH). After this calibration process, scoring was performed under application of specific assessment sheets (provided as supplementary download). To prevent recall bias, T1w TSE and opGRE sequences were scored in 2 separate scoring sessions, each consisting of 2 parts. Readers were blinded to examination timepoints and treatment groups. Session 1, part A, comprised scoring of fatty lesions according to the SBM. Afterward, part B comprised scoring of erosions using T1w TSE sequences. After an interval of 8 weeks, session 2 was conducted with scoring of fatty lesions on T1w TSE sequences according to the CBM in part A, and scoring of opGRE sequences for erosions as part B.
For the scoring process, each SIJ was divided into 4 quadrants, separating the iliac from the sacral part and the upper (anterosuperior) from the lower (posteroinferior) part of the SIJ. Quadrants were divided by a fictional horizontal line along the bottom of the first sacral neural foramina and these lines were virtually copied to other slices. Accordingly, per patient, 8 quadrants had to be evaluated (Figure 1)15,23,24. All slices depicting the SIJ cleft were taken into account for scoring.
Statistical analysis
Baseline characteristics of the 2 treatment groups were compared under application of the chi-square test and Mann-Whitney U test. P values < 0.05 were considered statistically significant. All calculations were performed using the patient sum scores, thus ranging from 0–8 (SBM) or 0–24 (CBM). Interreader agreement for the comparison of the 2 scoring methods (SBM vs CBM for fatty lesions) and the 2 MRI pulse sequences (T1w TSE vs opGRE for erosions) was determined by calculating intraclass correlation coefficients (ICC) for status scores at all 3 timepoints as well as for change scores. The agreement of the SBM and CBM was assessed by calculating Spearman’s correlation coefficients for fatty lesion scores. Correlation values were defined as follows: 0–0.2, poor; 0.3–0.4, fair; 0.5–0.6, moderate; 0.7–0.8, strong; > 0.8, excellent agreement25,26. Standardized response means (SRM) between baseline and Week 48 were calculated for fatty lesion scores to determine sensitivity to change of both scoring methods; higher SRM represent higher responsiveness. Additionally, Bland-Altman plots were generated to visualize systematic differences between both readers. All calculations were conducted by the statistical department of the German Rheumatism Research Centre (www.drfz.de/en).
RESULTS
Descriptive statistics
All 75 enrolled patients had active axSpA, shown by clinical measures and positive MRI, and were randomly assigned to the etanercept (n = 40) or SSZ treatment groups (n = 35). There were no significant statistical differences between both groups at baseline (Supplementary Table 1 available from the author on request).
Fatty lesions
For the evaluation of the SBM and CBM, fatty lesions were scored at baseline, Week 24, and Week 48. Applying the SBM, mean fatty lesion scores for both readers in the etanercept group were 4.59 and 5.19 at baseline and Week 48, respectively, while the CBM revealed mean fatty lesion scores of 6.59 and 7.64, respectively. In the SSZ group, mean fatty lesion scores dropped from 3.91 at baseline to 3.72 at Week 48 using SBM, and from 5.30 to 4.98 when applying the CBM (Table 1). Figure 2 gives an example of scoring results for fatty lesions.
The agreement between both scoring methods was very good to excellent (Spearman’s correlation coefficients of 0.79–0.90; Supplementary Table 2 available from the author on request).
Sensitivity to change of both methods was estimated by calculating SRM. Under application of the CBM, SRM was 0.86 in patients treated with etanercept, although differences were evident between readers: SRM of 0.59 and 0.94 were calculated for reader 1 and reader 2, respectively. The application of the SBM resulted in an SRM of 0.59 for both readers and SRM of 0.49 and 0.44 for reader 1 and 2, respectively. In the SSZ group, application of both methods revealed a decrease of fatty lesion scores, resulting in negative SRM of −0.27 to −0.60 for CBM, while for SBM a sensitivity to change of −0.08 to −0.36 was found (Table 1).
Application of the SBM resulted in a good interreader agreement of status scores with ICC of 0.76–0.77, while the CBM showed ICC of 0.58–0.62 (Table 2). ICC for change scores were moderate for SBM (ICC of 0.42 and 0.48 for weeks 24 and 48, respectively), while under application of the CBM, ICC were 0.69 and 0.67, respectively (Table 3). Bland-Altman plots revealed smaller systematic differences between the readers under application of the CBM compared to the SBM (Supplementary Figure 1 available from the author on request).
Erosions
For the comparison of T1w TSE sequences with opGRE sequences, erosions were quantified under application of the CBM (Figure 3). Mean erosions scores were higher using the opGRE sequences compared to TSE sequences: the overall mean erosion score for T1w opGRE was 7.53 ± 4.55, while for TSE sequences a mean erosion score of 6.48 ± 3.70 was found. In detail, the mean erosion scores in the etanercept group were 8.22 ± 3.96 for opGRE and 7.10 ± 3.84 for TSE; the corresponding mean erosion scores in the SSZ group were 6.92 ± 5.01 and 5.77 ± 3.47, respectively. Interreader agreement for opGRE sequences was 0.65, while the ICC for T1w TSE sequences was 0.18 (Table 2). Bland-Altman plots showed fewer outliers and smaller systematic differences between scorers when using opGRE sequences (Supplementary Figure 1 available from the author on request).
DISCUSSION
The data presented are part of a prospective, randomized multicenter trial (ESTHER), comparing the effects of etanercept versus SSZ over a period of 48 weeks in patients with axSpA16,20. While the diagnostic value of MRI in the detection of active inflammation in axSpA is undisputed, the detection and quantification of structural osseous changes (e.g., fat deposition, erosions, sclerosis, and ankylosis) in the course of disease represents a new focus of pharmaceutical and imaging studies. These structural osseous changes have been reported to appear in a certain order, beginning with active inflammation, which may lead to erosive bone destruction and possible alteration of the joint space width because of confluent erosions6. When inflammation decreases, fatty lesions eventually appear. During the later stages of axSpA, bony bridging may lead to partial or complete ankylosis of the affected regions16,18,27. Although the pathophysiology of axSpA is not well understood, the analysis of fatty lesions and erosions, which appear to be early structural osseous changes in the course of axSpA, has shown to be useful in the evaluation of the disease progress and efficacy of medical treatment11,16.
As demonstrated by Song, et al, disappearance of inflammation and occurrence of fatty lesions within the course of the trial were significantly higher in the etanercept than in the SSZ treatment group16. However, in the reports by Song, et al and Althoff, et al, fatty lesions were evaluated as either present or absent per quadrant (according to SBM)16,24. This scale appeared inappropriate to show changes in the course of 1 year; therefore, the Berlin MRI scoring system was revised with a more comprehensive scale of 0–3 per quadrant (CBM).
We could show that both SBM and CBM can monitor the development of fatty bone marrow lesions, although the CBM is more sensitive to change. Mean fatty lesion scores during treatment with etanercept increased from 4.59 to 5.19 when applying the SBM (SRM of 0.59), and from 6.59 to 7.64 when applying the CBM (SRM of 0.86). However, a slight reduction of the interreader reliability has to be taken into account (ICC dropped from 0.76–0.77 to 0.58–0.62). Interestingly, in contrast to the etanercept group, fatty lesion scores decreased slightly in the SSZ group, which is most probably because of focal signal decrease due to newly occurring areas of inflammation. Depending on their number and extent, these lesions may reduce signal intensity to a degree that is represented by lower fatty lesion scores. Further, our analysis revealed moderate ICC for fatty lesion change scores under application of the SBM (ICC of 0.42 and 0.48 for weeks 24 and 48, respectively), while under application of the CBM, ICC were 0.69 and 0.67, respectively (Table 3), indicating that the more differentiated CBM is, especially if slight changes occur, the more easily it is applicable than the SBM. Corresponding Bland-Altman plots further demonstrate that the systematic difference between both readers is lower for CBM compared to the SBM (Supplementary Figure 1 available from the author on request).
To date, only a few studies have introduced a scoring method for structural osseous changes of the SIJ on MR images14,15,28,29,30,31. The Berlin research group initially introduced an MRI scoring system that basically referred to the New York criteria, i.e., structural osseous changes were globally scored in a 0–4 grading system without distinguishing between fatty lesions, erosions, sclerosis, or ankylosis15,28,32. The MISS (MR Imaging of Seronegative Spondylarthropathy) working group proposed a similar method, with evaluation of the iliac and sacral part of each SIJ29,30. The Leeds group introduced a grading system with evaluation of sclerosis and ankylosis on a 0–3 scale per quadrant (0: absent, 1: ≤ 25%, 2: 25–75%, 3: > 75% of the articular surface)33. The Denmark group initially proposed an MRI grading system with separate evaluation of fat accumulation, erosions, joint space width, and sclerosis on a 0–3 scale and transformation into an overall score for joint destruction14. That study showed that the 0–3 grading method results in a good to excellent interreader agreement with regard to sclerosis, erosions, fat accumulation, and overall joint destruction14. More recently, this group proposed a similar grading system34,35 that scores fatty bone marrow deposition per quadrant on a 0–3 scale [0: normal, 1: < 25%, 2: 25% to 50%, 3: > 50% of the joint area; additional score for lesion depth (0: < 1 cm, 1: > 1 cm)]; erosions are scored per half of each SIJ, and ankylosis is assigned a score of 0–2 per joint (0 = no ankylosis, 1 = partial, 2 = complete ankylosis). These scores can be transformed into a total chronic score of 0–48 per patient34,35. Weber, et al adapted the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system (originally developed for assessment of active inflammation in SIJ, with dichotomous grading of each quadrant on 6 adjacent MRI slices and additional scoring of lesion intensity and depth) for quantification of fat deposition and erosions, while ankylosis is assigned a score for each half of both SIJ36. Wick, et al modified the SPARCC method for the evaluation of joint space width, erosions, and subcortical cysts11. Sensitivity to change of the proposed scoring systems has not yet been assessed.
The second aim of our study was the comparison of unenhanced T1w TSE sequences, one of the standard sequences for MRI of the SIJ, with T1w opGRE sequences regarding the detection and quantification of erosions. We could show that the interreader agreement was much higher for opGRE images (ICC 0.65) than for TSE images (ICC 0.18–0.37), indicating that T1w opGRE sequences may be more useful for the evaluation of erosions in patients with axSpA than the usually applied T1w TSE sequences. Corresponding Bland-Altman plots demonstrate that under application of opGRE sequences, fewer outliers and smaller systematic differences occur between scorers compared to TSE sequences. However, compared to other studies that have assessed interreader agreement in the detection of erosions on T1w SE sequences, ICC in our study (0.18–0.37) appear unusually low, although even more experienced readers may also observe erosion ICC of about 0.40–0.6010,12,37. Additionally, low interreader agreement may occur because readers were blinded to other sequences of the same patient, such as STIR sequences, which was not the case in most of the other existing studies.
To date, only limited reports exist about the value of opGRE sequences in axSpA6,8,15,38,39,40. The authors mention the excellent contrast between the articular cartilage, subchondral plate, and adjacent bone marrow; however, the proposed imaging techniques have not been analyzed separately to demonstrate the superiority of 1 sequence compared to the other with regard to delineation of erosions. Further, GRE sequences are known to be susceptible to a variety of artifacts as shown in several veterinarian and human studies, leading to significantly greater subchondral bone thickness measures, underestimation of subchondral lesions, and false-negative scoring results41,42,43,44; although in our study, mean erosion scores and interreader agreement were higher for opGRE than for T1w TSE sequences. However, one of those veterinary studies found high sensitivity of GRE sequences regarding the detection of osteochondral defects, while their specificity was shown to be rather low42.
Aside from T1w and T2*w GRE sequences, other noteworthy sequences in the detection of erosions include T2w sequences8,13 and contrast-enhanced T1w fat-saturated SE sequences13,15. So-called cartilage sequences such as T2w GRE, unenhanced fat-saturated T1w sequences, 3-dimensional (3-D) FLASH, and 3-D double excitation in the steady-state (3D-DESS) sequences have been discussed to facilitate the detection of erosions34,35,45,46. Proton-density weighted sequences play an important role in the depiction of cartilage, menisci, ligaments, and tendons, especially in larger joints, and are predominantly used in orthopedic-traumatological MR protocols43,47.
To our knowledge, our study is the first to assess the value of T1w opGRE in comparison to the usually applied T1w TSE sequences with regard to the detection and quantification of erosions in patients with rheumatic diseases. Unfortunately, T1w opGRE sequences were only acquired at 1 timepoint, so that their sensitivity to change was not assessable in our analysis. Comparison to healthy control subjects was not possible in the setting of this analysis of patients within the ESTHER trial20 but should be performed prospectively in a future study.
The CBM appears to be a feasible MRI scoring method for the quantification of fatty lesions and erosions, with higher sensitivity to change compared to the SBM. Additionally, we could show that for the detection of erosions, T1w opGRE sequences appear to be more reliable than the usually applied T1w TSE sequences. More studies need to be conducted to further evaluate feasibility and sensitivity of the CBM for the assessment of structural osseous changes on MRI and to determine the value of T1w opGRE sequences regarding a more reliable detection of erosions in patients with axSpA.
- Accepted for publication November 29, 2013.