Abstract
Objective. We tested the discriminatory capacity of diffusion-weighted magnetic resonance imaging (DWI) and its potential as an objective measure of treatment response to tumor necrosis factor inhibition in ankylosing spondylitis (AS).
Methods. Three cohorts were studied prospectively: (1) 18 AS patients with Bath Ankylosing Spondylitis Disease Activity Index > 4, and erythrocyte sedimentation rate > 25 and/or C-reactive protein > 10 meeting the modified New York criteria for AS; (2) 20 cases of nonradiographic axial spondyloarthritis (nr-axSpA) as defined by the Assessment of Spondyloarthritis international Society (ASAS) criteria; and (3) 20 non-AS patients with chronic low back pain, aged between 18 and 45 years, who did not meet the imaging arm of the ASAS criteria for axSpA. Group 1 patients were studied prior to and following adalimumab treatment. Patients were assessed by DWI and conventional magnetic resonance imaging (MRI), and standard nonimaging measures.
Results. At baseline, in contrast to standard nonimaging measures, DWI apparent diffusion coefficient (ADC) values showed good discriminatory performance [area under the curve (AUC) > 80% for Group 1 or 2 compared with Group 3]. DWI ADC values were significantly lower posttreatment (0.45 ± 0.433 before, 0.154 ± 0.23 after, p = 0.0017), but had modest discriminating capacity comparing pre– and posttreatment measures (AUC = 68%). This performance was similar to the manual Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system.
Conclusion. DWI is informative for diagnosis of AS and nr-axSpA, and has moderate utility in assessment of disease activity or treatment response, with performance similar to that of the SPARCC MRI score.
- ANKYLOSING SPONDYLITIS
- SPONDYLOARTHRITIS
- ADALIMUMAB
- DIFFUSION MAGNETIC RESONANCE IMAGING
- SENSITIVITY
- SPECIFICITY
The development of magnetic resonance imaging (MRI) protocols for the diagnosis of axial spondyloarthritis (axSpA) has enabled earlier detection of disease in patients prior to the development of the radiographic abnormalities that define the development of ankylosing spondylitis (AS). This has revolutionized clinical care of these patients and research into the early phases of axSpA, raising the possibility that through early detection and intervention, the natural history of this disease may be altered and progressive spinal fusion avoided. While MRI has proven sensitive for early disease detection, reading MRI scans remains at least partially subjective, and significant variability in scoring remains problematic1.
The paucity of data about MRI changes, particularly in spinal images, has raised questions about the specificity of currently used criteria for the diagnosis or assessment of axSpA by MRI2,3. Further, quantitative MRI measures of either inflammatory change or damage require complex scoring systems, and have therefore not become widely used in clinical practice, although they are used routinely in clinical trials where expert central reading is typical.
Diffusion-weighted imaging (DWI) is an MRI sequence that relies on the random Brownian motion of water molecules within tissues. This motion, or diffusion, is influenced by tissue composition and architecture, with reduced mean free water path being associated with greater signal. DWI has been shown to have advantages over standard MRI sequences in some clinical settings, including the early diagnosis of ischemic stroke and in staging some cancer types4,5. Different measures of diffusion have been proposed, with the apparent diffusion coefficient (ADC) measure the most widely used. Inflammation leads to higher ADC values through increased water in extracellular, less constrained, spaces. Several studies have investigated the clinical utility of DWI in AS, with suggestive evidence that this sequence has valuable discriminatory capacity between AS and noninflammatory back pain, and is sensitive to changes in disease activity in response to treatment with tumor necrosis factor-α inhibitors (TNFi)6,7,8,9,10,11.
Disease activity measurement in AS is challenging. Currently the main approaches are to use patient self-completed questionnaires [such as the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI)], blood tests [such as erythrocyte sedimentation rate (ESR)/C-reactive protein (CRP)], and combinations of questionnaires and blood tests [such as the Ankylosing Spondylitis Disease Activity Score (ASDAS)-CRP and ASDAS-ESR]. Patient self-completed questionnaires such as these are subjective and influenced by variation such as patient reporter biases, coexistent fibromyalgia, and concomitant medicines such as analgesics; they were not developed for use in assessing patients with noninflammatory back pain. In the current study we investigated the discriminatory capacity of DWI in comparison with standard classification criteria for axSpA, and assessed its usefulness as an objective measure of disease activity both cross-sectionally and longitudinally in response to treatment with the TNFi adalimumab, in comparison with standard clinical measures used for treatment response assessment.
MATERIALS AND METHODS
Subjects
The study protocol was approved by the Metro South Hospital and Health Service (approval HREC/11/QPAH/479) and The University of Queensland Research Ethics Committees (approval 20130000333), and all patients gave written informed consent. Three groups of patients were recruited and studied prospectively. Group 1 consisted of 18 patients diagnosed with AS defined by the modified New York criteria12, all with active disease (BASDAI > 4, and CRP > 10 mg/dl and/or ESR > 25 mm/h). These patients underwent an MRI immediately prior to commencing 13 weeks of TNFi treatment (adalimumab 40 mg s/c fortnightly) and then again after 3 months of treatment. Group 2 consisted of 20 patients who met the Assessment of Spondyloarthritis international Society (ASAS) classification for nonradiographic axSpA (nr-axSpA)13. Group 3 consisted of 20 patients who had been referred to our AS clinic for diagnostic purposes with chronic low back pain for more than 3 months, and who did not meet the imaging arm of the ASAS criteria for axSpA. They were considered on clinical grounds (history and physical examination) not to have axSpA, although 10 of these patients did meet the ASAS clinical criteria for axSpA. In some analyses, Group 1 and 2 patients, classified as having inflammatory spinal arthritis, were pooled (active) and compared with Group 3 patients who were classified as having noninflammatory spinal arthritis (controls).
Excluded from participation were pregnant women, people under 18 years of age, people with an intellectual or mental impairment, and those with a contraindication for undergoing an MRI. Patients taking corticosteroids were also excluded, and nonsteroidal antiinflammatory drugs and analgesic medications were kept stable during the followup study of Group 1 patients. Standard exclusion criteria for use of TNFi therapies were applied to participants in Group 1.
For all groups, at each scan timepoint, disease activity was fully assessed by nonimaging methods — blood tests (full blood count, CRP, and ESR), patient-reported disease activity (BASDAI), functional score (Bath Ankylosing Spondylitis Functional Index; BASFI), mobility measure (Bath Ankylosing Spondylitis Metrology Index; BASMI), and the ASDAS (Ankylosing Spondylitis Disease Activity Score).
MRI methods
All patients underwent MRI scans of the entire spine and sacroiliac joints (SIJ) without contrast agent. The scans were performed on a 1.5-Tesla platform (Magnetom Avanto, Siemens) using up to 32 channels (spine array coil, 24 channels; and body array coil, 6 elements).
Three regions were scanned using the following protocols:
Cervicothoracic spine from the foramen magnum to superior border of T10 with sagittal fast spin-echo T1WI (TR/TE: 425/14) and sagittal short-tau inversion recovery (STIR; TR/TE: 4250/55), field of view (FOV) 38 cm, 3 mm thick, 20 slices; and
Thoracolumbar spine from T9 to S3 with sagittal fast spin-echo T1WI (TR/TE: 425/14) and sagittal STIR (TR/TE: 4250/55), FOV 38 cm, 4 mm thick, 20 slices; and
SIJ with (a) oblique coronal images parallel to the long axis of the sacrum: fast spin-echo T1WI (TR/TE: 521/14) and STIR (TR/TE: 5272/50), FOV 30 cm, 4 mm thick, 20 slices, (b) oblique axial images perpendicular to the long axis of the sacrum: fast spin-echo T1WI (TR/TE: 652/14) and STIR (TR/TE: 5140/61), FOV 24 cm, 4 mm thick, 0.4 mm intersection gap, 20 slices, and (c) oblique coronal images DWI single-shot echo planar sequence (TR/TE: 4100/87), b-value/diffusion gradients of 0, 50, 400, and 800s/mm2, FOV 30 cm, 4 mm thick, 20 slices, 8 excitations.
Image interpretation and analysis
Nonblinded review of the scans was performed by a musculoskeletal radiologist (NS) on an Agfa Impax 6.1 picture archiving and communication system (PACS). Images were initially assessed for presence or absence of bone marrow edema (BMO) on the STIR sequence. The intensity of BMO was graded following visual inspection using: grade 0 = absent BMO, grade 1 = minor BMO less intense than adjacent vessels, grade 2 = marked BMO as intense as adjacent vessels. The Spondyloarthritis Research Consortium of Canada (SPARCC) MRI scores were determined according to protocol14, several months after ADC mapping and BMO scoring.
ADC maps were linked to the corresponding DWI and STIR images using an image-linking tool on PACS to ensure accurate localization of the edema/region of interest (ROI). Because the ROI had to be reproduced on pre- and posttreatment scans, it was not possible to conduct a blinded analysis. The ADC value was measured twice, then averaged in both SIJ and mid-body of S2 vertebral level. The latter is termed “sacral ADC” in analyses (presumed normal reference bone). A circular ROI with a constant area of 0.44 cm2 was placed on selected areas of BMO on the ADC map. Adjacent cortical bone, sclerosis, or fat metaplasia were excluded from the ROI to avoid volume averaging and erroneous ADC values. In patients where no BMO was detected, the ADC values were measured at identical points on both SIJ. Background reference signal was defined as the marrow signal along the midline of the sacrum at S2 level, which was also measured twice and averaged.
Statistical method
Between-group comparisons were performed using Student t tests and ANOVA, with the exception of Group 1 pre– and posttreatment values, for which paired t tests/ANOVA were used. Correlation between values was assessed using either Pearson correlation coefficient (ADC values) or Spearman’s correlation coefficient (clinical criteria). Discriminatory capacity was tested using receiver-operator characteristic (ROC) analysis. Nonparametric Mann-Whitney U tests (comparing independent groups) or Wilcoxon tests (comparing the paired pre– vs posttreatment in Group 1) assessed whether there was a significant difference between the ROC curves. AUC and thresholds for the best sensitivities/specificities were reported. Responsiveness was reported as the effect size using Glass’s delta method, and standardized response mean (SRM) for the paired Group 1 pre- and posttreatment values. A significant difference was declared when p < 0.05.
RESULTS
Demographics and clinical values
Clinical and demographic details of the patients studied are provided in Table 1. Most disease activity measures were higher in Group 1 patients pretreatment than either Group 2 or 3 patients, including CRP, ESR, ASDAS-CRP, and ASDAS-ESR. BASDAI was higher in Group 1 than Group 2, but not higher than Group 3. BASFI values were also higher in Group 1 than Group 2 and 3, though not achieving statistical significance compared with Group 3 (p = 0.078). BASMI was not significantly different between the groups. Posttreatment, Group 1 patients had significantly lower disease activity values than pretreatment and either Group 2 or Group 3 patients. BASFI values fell significantly posttreatment, and were not different compared with either Group 2 or 3 patients. BASMI values did not change significantly with treatment. No significant differences were noted between Group 2 and 3 patients.
Strong correlation was noted between disease activity measures, both in the overall dataset (r2 > 0.3), and considering active patients alone (Supplementary Figures 1 and 2, available with the online version of this article). Correlation between disease activity measures and BASFI was moderate (Spearman correlation coefficient 0.28–0.69).
Comparing the capacity of nonimaging measures to distinguish between active cases and controls, for most values no significant discriminatory capacity was found for BASDAI, BASFI, BASMI, ASDAS-ESR (AUC = 0.51–0.64, p > 0.05), with moderate discriminatory capacity observed for CRP and ESR alone (AUC = 0.74 and 0.69, and p = 0.002 and 0.02, respectively; Supplementary Figure 3, available with the online version of this article). For CRP, using a threshold of 6 to define active disease had a sensitivity of 80% and specificity of 68%.
Considering the discriminatory capacity of pretreatment measures in Group 1 patients in relation to posttreatment values, disease activity values all had high discriminatory capacity (AUC = 0.88–0.99, p < 0.05), with BASDAI and ASDAS-ESR and ASDAS-CRP having nearly equivalent high discriminatory capacity (AUC = 0.97–0.99, p < 0.05; Supplementary Figure 4, available with the online version of this article).
ADC values
DWI ADC values for each group are presented in Table 2. Because ADC values were similar between duplicate measures (Pearson correlations between reads were 0.976–0.993), these were averaged for further analyses. Of note, no Group 3 subjects displayed signs of inflammation by MRI.
Comparisons of ADC values at matched sites are presented in Table 3. Sacral ADC values were significantly higher in Group 1 patients pretreatment compared with Group 3 patients (p = 0.0089), but not different between any other groups (p > 0.05). Sacroiliac ADC values were significantly higher in Group 1 patients pretreatment and Group 2 patients compared with either Group 1 posttreatment or Group 3 patients. No difference was noted between Group 1 posttreatment and Group 3 patients, or between Group 1 pretreatment and Group 2 patients (p > 0.05).
For Group 1 pretreatment and Group 2, ADC values were higher over each SIJ than the reference sacral measurement (p < 0.005 all analyses; Supplementary Table 1, available with the online version of this article). In Group 1 posttreatment cases, this observation was less notable and significant only for the left but not right SIJ (p = 0.0047 and p = 0.083 respectively); no differences were observed in Group 3 patients.
Significant variation in ADC values was observed even among control subjects, with strong correlation between sacral and sacroiliac values within that Group (r2 between sacral and left and right ADC values = 0.84 and 0.80, respectively), suggesting that variation in underlying tissue properties between individuals is present. Therefore, for further analyses, right and left sacroiliac values were corrected for sacral values by subtracting sacral values from the sacroiliac values, which were then averaged to produce a single ADC value (corrected) for each individual (Table 2).
Comparing groups, significant differences were observed between corrected values for Group 1 pretreatment and posttreatment, and between Group 3 and any of the groups. A 2.9-fold reduction in mean corrected scores between pre– and posttreatment groups was observed [mean corrected ADC score Group 1 pretreatment 0.45 (SD 0.433), posttreatment 0.154 (SD 0.23), p = 0.0017]. While no difference was observed between Group 1 pretreatment and Group 2 values, a borderline significant reduction in Group 1 posttreatment values was noted compared with Group 2 values.
BMO and SPARCC scores correlate with ADC values
ADC values were substantially higher for increasing BMO scores (Supplementary Figure 5, available with the online version of this article). No Group 3 subject had a BMO score > 0, whereas 58% (11/19) of Group 1 subjects and 71% (15/21) of Group 2 subjects had a BMO score > 0. While BMO scores did decline in the Group 1 posttreatment versus pretreatment comparison, this was not significant on either side (left side p = 0.0024, right side p = 0.068).
Pretreatment SPARCC scores correlated well with ADC values (Group 1 pretreatment Spearman correlation coefficient = 0.79, p = 0.00010; Group 2 Spearman correlation coefficient = 0.86, p = 0.00001). Correlations were weaker posttreatment in Group 1 (Spearman correlation coefficient = 0.43, p = 0.073).
Correlation between ADC values and clinical disease activity measures
No consistent correlations were noted between right and left sacroiliac ADC values and any clinical disease activity measures. While nominally significant correlations for some measures were observed with individual sides, in each case these were either unilateral or with opposite directions of correlation with either side.
Discriminatory capacity and responsiveness of ADC measures
Excellent discriminatory capacity (AUC > 0.8, p < 0.05) was noted comparing Group 1 pretreatment, Group 2, and active cases with Group 3 controls (Figure 1; and Supplementary Figure 6, available with the online version of this article). Comparing active cases with controls, the optimal ADC threshold (0.054) resulted in a specificity of 95% and sensitivity of 87% (AUC = 0.91, p = 1.5 × 10−8).
In contrast, no significant discrimination was observed in comparisons of Group 1 pretreatment and Group 2 cases (AUC = 0.45, p = 0.65). Moderate though statistically significant discriminatory capacity was observed comparing Group 1 pre– and posttreatment values (AUC = 0.67, p = 2.8 × 10−3). For an ADC value threshold of 0.53, the test had a specificity of 0.94 and sensitivity of 0.56.
SPARCC scores performed similarly to ADC measures in these analyses, with AUC and p values respectively for Group 1 pretreatment, Group 2 and active cases versus Group 3 controls ≥ 0.8 and < 0.05. No difference was noted in Group 1 pretreatment and Group 2 cases (AUC = 0.52, p = 0.88). Moderate though statistically significant discriminatory capacity was observed comparing Group 1 pre– and posttreatment values (AUC = 0.7, p = 2.5 × 10−3).
ADC measure and SPARCC score effect sizes were similar when comparing active cases with Group 3 controls, and far greater than for all other measures (Table 4). In assessment of change in response to treatment (Group 1 pre- vs posttreatment), again ADC measures performed similarly to SPARCC scores, with similar effect sizes and SRM. In contrast to the cross-sectional comparison of active cases with Group 3 controls, in this longitudinal comparison both MRI scores showed much less responsiveness than any other score except metrology (BASMI), with which they performed similarly (Table 5).
DISCUSSION
Our study confirms that DWI has excellent capacity to distinguish active AS cases and nr-axSpA cases from control cases with noninflammatory back pain. Its performance exceeded that of widely used measures of disease activity (including BASDAI, ESR, CRP, and ASDAS-ESR and ASDAS-CRP) in distinguishing active disease from noninflammatory back pain controls. In our current study, the semiautomated DWI ADC scoring method performed similarly to manual SPARCC scoring in both baseline comparisons and in comparison of pre– with posttreatment imaging.
Our study demonstrated that following TNFi treatment, DWI ADC values at a group level did decrease significantly, with posttreatment values being 66% lower than pretreatment values. This is comparable with other quantitative MRI measures, such as the SPARCC sacroiliac score, in which the effect size and SRM in our study was very similar to those for ADC values. In the EMBARK study of etanercept (ETN) treatment of nr-axSpA, SPARCC sacroiliac scores fell at Week 12 by 48% compared with baseline15 and by Week 48 to 58% of baseline values16. In a further study of ETN treatment in axSpA (radiographic status not stated), Berlin MRI scores fell at Week 24 by 60% compared with baseline, and by 69% at Week 4817.
However, considering discriminatory capacity at an individual level, ADC measures had only moderate capacity to distinguish cases pre– and posttreatment (AUC = 0.68), and the clinical disease activity measures studied performed significantly better (AUC 0.88–0.95). The performance of SPARCC scoring was similar, and both scores showed lower responsiveness (as assessed by effect size or SRM) than all other scores except BASMI. There is a significant need for methods to identify axSpA cases that respond well to TNF inhibition. ESR, CRP, and MRI have previously been shown to have utility in this setting, largely in posthoc analysis of clinical trial data. It is possible that DWI may perform better after a longer period of treatment, and further trials are warranted.
DWI also did not correlate well in cross-sectional analyses with disease activity measures, either patient self-reported (BASDAI), acute-phase reactants (ESR or CRP), or combined measures (ASDAS-CRP, ASDAS-ESR). DWI images of the spine have a very limited spatial resolution and thus do not allow accurate localization or ADC measurement of active disease. This in turn precludes evaluation of patients with predominant active spinal disease. MRI has previously also been shown to have only minor correlations with disease activity measures. In established AS, neither the Berlin sacroiliac nor spinal scores correlated with BASDAI, morning stiffness, global pain, patient’s or physician’s global scores, CRP, ESR, or BASFI18. The Berlin sacroiliac score was also shown not to correlate with baseline patient global scores, ESR, or CRP in AS, and to have only modest correlations with ESR (ϱ = 0.31, p = 0.016) and CRP (ϱ = 0.38, p = 0.004), and not with patient global scores, in undifferentiated SpA19. Similarly, in a study of patients with SpA (radiographic status unknown), baseline Berlin sacroiliac scores did not correlate with ASDAS, BASDAI, or CRP levels, and change in score with treatment did not correlate with change in BASDAI or CRP20. In studies of TNF inhibition in axSpA cases (of uncertain radiographic status), reduction in MRI scores was shown to correlate with reduction in BASDAI (ϱ = 0.37) and CRP (ϱ = 0.52 among those with an elevated CRP), although this correlation was only observed in those with short disease duration (< 4 yrs)21. Performance of the SPARCC score is similar, with no correlation noted between SPARCC scores and self-reported measures (variously BASDAI, nocturnal pain, patient’s global assessment, and total back pain)22,23,24, with 1 study showing correlation with change in SPARCC score and change in CRP with treatment24. Thus, at least cross-sectionally, current MRI quantitative scores do not consistently correlate with either self-reported or objective (CRP, ESR) measures of axSpA activity.
While we did not assess interreader reliability in this study, in 3 previous studies of DWI this has been good to excellent (κ = 0.62–0.9310, ICC = 0.89–0.9825, κ = 0.8926). Intrareader reliability in the current study was also excellent, with correlation between duplicate readings ranging from 0.976 to 0.993. These compare favorably with reported reliability statistics for Berlin and SPARCC MRI scores (intra- and interreader ICC, respectively, 0.93–0.97 and 0.5–0.97 Berlin score, 0.92–0.97 and 0.78–0.98 SPARCC score27).
Different approaches for quantifying DWI measures have been developed, including the ADC and intravoxel incoherent motion (IVIM), which may improve the performance of the methodology28. DWI reflects the random motion of water protons, which is performed using a range of diffusion weightings (b values), usually between 0 and 1000 s/mm2. Water diffusion, tissue interfaces, membranes, and microcapillary perfusion influence on in vivo DWI. The ADC value is a monoexponential fit that represents the relationship between the log (tissue signal intensity/attenuation) and the b value. Signal attenuation at low b values (≤ 100 s/mm2) is strongly influenced by microcapillary perfusion, while signal attenuation at higher b values (100–1000 s/mm2) reflects the diffusion component. IVIM is a more complex technique based on this concept, which separately quantifies microcapillary perfusion and pure tissue diffusion coefficient from DWI data, allowing more accurate characterization of the biexponential behavior of tissues as a result of microcapillary perfusion. Software required for the IVIM analysis is not widely available on all MRI platforms, and for that reason was not performed in our study. However, it has theoretical advantages over the ADC measurements, and comparisons of the 2 approaches in axSpA would be valuable.
This study shows that DWI has excellent performance as a diagnostic tool in distinguishing axSpA from noninflammatory causes of back pain, but performs only modestly in detecting treatment response at 13 weeks and correlates poorly with standard disease-activity measures. Its performance is very similar to the manual SPARCC scoring system. Further studies are indicated to investigate the performance of DWI in screening back pain patients for axSpA, and in distinguishing the subset of patients with nr-axSpA who go on to develop AS.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
We thank the patients who took part in this study.
Footnotes
The study was funded by a grant from Abbvie Australia. MAB is funded by an Australian National Health and Medical Research Council (NHMRC) Senior Principal Research Fellowship. KALC is supported by an NHMRC Career Development Fellowship (APP1087415). The funding bodies had no role in the design, collection, analysis, or interpretation of this study.
- Accepted for publication November 7, 2017.