Abstract
Objective. To evaluate the influence of low-dose infliximab (IFX) on spinal inflammation scored by magnetic resonance imaging (MRI). The dose recommended for rheumatoid arthritis (3 mg/kg) is also clinically effective for ankylosing spondylitis (AS), although effects on spinal inflammation as defined by MRI have yet to be described in a placebo-controlled trial.
Methods. In a 12-week double-blind period, patients were randomized 1:1 to receive either IFX 3 mg/kg at 0, 2, and 6 weeks, or placebo. Spinal inflammation in discovertebral units (DVU) was measured by the Spondyloarthritis Research Consortium of Canada (SPARCC) MRI Index at baseline and 12 weeks by 3 readers blinded to timepoint and treatment allocation. We also compared reliability and discrimination of the SPARCC MRI index based on evaluation of the entire spine (23 DVU score) compared to assessment of only the 6 most severely affected DVU (6 DVU score).
Results. At Week 12, patients treated with IFX experienced mean reductions of 55.1% and 57.2% in the 6 DVU and 23 DVU SPARCC scores, respectively, compared with a mean increase of 5.8% and decrease of 3.4% in 6 DVU and 23 DVU scores, respectively, for patients taking placebo (p < 0.001). A large treatment effect (Guyatt’s effect size ≥ 1.7) and high reliability was evident and comparable between 6 DVU and 23 DVU scoring methods.
Conclusion. Treatment with low-dose IFX leads to a large treatment effect on spinal inflammation as measured by MRI. Scoring for inflammation of only the most severely affected regions of the spine by MRI is comparable to assessment of the entire spine.
Ankylosing spondylitis (AS) is a chronic inflammatory disorder of the axial skeleton that may result in substantial disability and reduction in quality of life. Two major advances in the evaluation and management of this disease have been the introduction of magnetic resonance imaging (MRI) for the objective assessment of inflammation, and the substantial efficacy of anti-tumor necrosis factor-α (anti-TNF) agents. Inflammation on MRI is detected using fat-suppressed sequences, such as short-tau inversion recovery (STIR), which suppresses the signal from bone marrow fat, allowing visualization of the free water signal associated with underlying bone marrow edema. Inflammation in the spine is then readily detected within bone marrow at anterior vertebral corners and adjacent to the vertebral endplate remote from the vertebral corner. The thoracic spine is most frequently affected. Imaging of the spine is conducted in the sagittal plane of view and the spine is imaged in cervicothoracic and thoracolumbar portions. A particular advantage of MRI for assessment of spinal inflammation is its ability to visualize lesions in 3 planes of view by assessing consecutive sagittal slices through the spine. Systematic study has also shown that it is important to assess lesions in the lateral segments of the spine where inflammation is particularly frequent within and adjacent to the costovertebral joints1.
Several scoring systems have been developed to permit quantification of the extent of inflammation in the spine, particularly for the objective evaluation of therapeutic agents in clinical trials2. These differ in several respects, a principal difference being whether the entire spine or only a limited number of the most severely affected spinal units are assessed3,4. Each method scores inflammation within a single spinal unit, termed a discovertebral unit (DVU), an area defined by 2 imaginary horizontal lines through the middle of adjacent vertebrae on a sagittal plane of view that contains the disc and the 2 vertebral endplates with adjacent bone marrow. Scoring the entire spine is laborious and may result in scoring of artifacts and/or lesions with limited anatomical resolution, which may reduce reliability and discrimination5. The Spondyloarthritis Research Consortium of Canada (SPARCC) MRI index was developed to specifically address these concerns by quantifying lesions in only the 6 most severely affected DVU6. This number was chosen because prior study had demonstrated that the median number of affected DVU per patient with AS was 3–47. Subsequent research showed that reliability and responsiveness was comparable when either 6 DVU or all 23 spinal DVU were assessed, although discrimination was not examined4.
Several anti-TNF-α agents have now been shown to be efficacious in patients with AS. These include infliximab (IFX) when administered in a dose of 5 mg/kg every 6 to 8 weeks8,9. Open-label studies have shown that a lower dose of 3 mg/kg every 8 weeks may also be efficacious10,11. A Canadian randomized placebo-controlled trial study has now verified the safety and efficacy of low-dose IFX (3 mg/kg) for AS12. Discordance of clinical and radiographic outcomes has been noted in previous studies and it was important to evaluate the effect of low-dose IFX on MRI outcomes as well as clinically defined outcomes. MRI was used at 2 institutions participating in our study to evaluate the effect of low-dose IFX on objective features of spinal inflammation. This also allowed comparative study of the relative performance of a scoring method limited to only the most severely affected regions of the spine compared to mandatory assessment of the entire spine.
MATERIALS AND METHODS
Patients
Patients were adults (≥ 18 years of age) with a diagnosis of AS defined by the modified New York criteria13 who were not responsive to or were intolerant of ≥ 1 nonsteroidal antiinflammatory drug (NSAID). Patients who had failed 1 or more disease-modifying antirheumatic drugs (DMARD; e.g., methotrexate, sulfasalazine) were also allowed to enroll. Active AS at baseline was defined by a Bath AS Disease Activity Index (BASDAI)14 score ≥ 4. Patients could continue sulfasalazine (≤ 3 g/day), methotrexate (≤ 25 mg/wk), hydroxychloroquine (≤ 400 mg/day), prednisone, and/or prednisone equivalents (≤ 10 mg/day), and/or NSAID as long as these doses had remained stable before baseline (14 days for NSAID, 30 days for DMARD/steroid). All patients were evaluated for latent tuberculosis infection at baseline using a purified protein derivative skin test and chest radiograph; patients with evidence of latent tuberculosis infection were allowed to participate if a documented history of antituberculous treatment was available or if prophylactic antituberculous treatment was initiated before the first dose of IFX. Patients were excluded from the study if they had a history of chronic/recurrent infectious disease, including tuberculosis, hepatitis B, or human immunodeficiency virus, and/or a diagnosis of malignancy or lymphoproliferative disease currently or within the past 5 years. Patients who had previously received TNF-α antagonist therapy or 1 or more intraarticular joint injections with corticosteroids within 4 weeks before the baseline visit were excluded. Patients with radiologic evidence of total spinal ankylosis (bamboo spine) were excluded.
Our study was approved by an independent ethics committee at the 2 study centers that conducted MRI examination, the University of Alberta and the University of Toronto. Our study was performed in accord with the ethical principles that originate in the Declaration of Helsinki. Written, informed consent was obtained from each patient.
Study design
This Phase IIIb, randomized, multicenter, double-blind, placebo-controlled study was conducted at 8 investigational centers in Canada, and all patients enrolled at 2 sites (University of Alberta, University of Toronto) were invited to have MRI examination at baseline and Week 12. Patients were randomly assigned in a 1:1 ratio to receive infusions of either placebo or IFX 3 mg/kg at Weeks 0, 2, and 6, with evaluation for primary and secondary endpoints at Week 12. The primary endpoint of this study was the proportion of patients achieving a 20% improvement in ASsessment in AS (ASAS) International Working Group criteria (ASAS20) after 12 weeks of treatment15. Secondary endpoints at Week 12 included the change from baseline in BASDAI, Bath AS Functional Index16 (BASFI), C-reactive protein (CRP; mg/l), and erythrocyte sedimentation rate (ESR), as well as ASAS40, ASAS50, ASAS70, and ASAS 5/617.
An MRI of the spine was performed at baseline and Week 12 using appropriate surface coils for systems operating at 1.5 Tesla. STIR sequences were obtained in sagittal orientation, with 12 to 15 slices, 3–4 mm thick, acquired using the following measurements: repetition time 2720–3170 ms; echo time 38–61 ms; time to inversion 140 ms. Imaging of the spine was divided into 2 parts: (1) the entire cervical spine and most of the thoracic spine, and (2) the lower portion of the thoracic spine and the entire lumbar spine. T1 spin-echo images of the entire spine were also obtained for use as anatomical references.
Scoring of MRI lesions
The SPARCC scoring method is based on an abnormal increased signal on the STIR sequence, representing bone marrow edema (defined as an increased concentration of “free water” relating to a bone lesion). Examples of the scoring method for the spine have been published4–6. Details of each scoring method are provided at the website www.arthritisdoctor.ca. The scoring method designed for use in clinical trials requires that the entire spine be assessed first and the 6 most severely affected DVU then selected for formal scoring of bone marrow edema. A single DVU has a scoring range of 0–18, so this results in a total scoring range of 0–108. However, it is also possible to evaluate and score bone marrow edema at all 23 DVU.
Viewing conditions
All scans were reviewed on workstations with 2–4 large screens and image-manipulation software. This system permitted simultaneous display of all MRI sequences (T1 and STIR for upper and lower spine for both timepoints) at original (life-size) dimensions. Scores were recorded electronically on an additional screen by entering data into an online scoring sheet that comprised a schematic of 3 DVU divided into quadrants to allow scoring of each DVU in 3 consecutive sagittal slices; the reader was able to see all scores and all scans simultaneously before committing to the final score. Each image was rated by 3 independent readers to allow calculation of interreader scoring variability. Two of the readers were qualified, fellowship-trained musculoskeletal radiologists and one is a rheumatologist who is a codeveloper of the SPARCC method. Readers were blinded to patients’ identities, treatments, and imaging timepoints.
Statistical analyses
Because the SPARCC scores were secondary endpoints of the overall study, there was no a priori power calculation for the MRI subanalysis. All statistical tests were 2-sided and comparisons were performed with α = 0.05. The changes in SPARCC scores from baseline to Week 12 were compared between the 2 treatment groups using an analysis of covariance (ANCOVA) model with the baseline score as a covariate. Interreader reliability of baseline and 12-week scores as well as the change from baseline to 12 weeks were determined by calculating intraclass correlation coefficients (ICC) using an ANOVA model with SPARCC scores as the dependent variable, and patient and reader (fixed factor) as independent variables. An ICC value > 0.6, > 0.8, and > 0.9 indicates good, very good, and excellent reproducibility, respectively. The association between change in selected clinical variables and the change in SPARCC scores from baseline to Week 12 was calculated using an ANCOVA model adjusted for baseline SPARCC score, treatment, and change in each of the following measures: BASDAI, BASFI, ESR, and CRP. Guyatt’s effect size for the change in SPARCC scores from baseline to Week 12 was calculated by dividing the mean change in the IFX group by the SD of the change in the placebo group. Effect sizes of at least 0.2, 0.5, and 0.8 are considered small, moderate, and large, respectively18.
RESULTS
Patient characteristics
A total of 36 patients were enrolled at the 2 study sites that participated in the MRI portion of the study, of which 18 were randomized to placebo and 18 to IFX. Two patients in each group withdrew prior to 12 weeks, so that 16 in each treatment group had baseline and 12-week MRI scans. Demographic and baseline clinical characteristics were similar in both treatment groups and consistent with a typical AS population that is refractory to standard therapy (Table 1).
Clinical outcomes
The mean (SD) change in BASDAI was −2.3 (1.9) in patients treated with IFX, and −0.8 (2.2) in patients given placebo (p = 0.05). Five of 16 (31.3%) patients who received IFX had a BASDAI 50 response, compared to 1 of 16 (6.3%) patients receiving placebo (p = nonsignificant). ASAS20, 40, 50, and 70 responses were noted in 68.8%, 56.3%, 50%, and 12.5% of patients receiving IFX, and 37.5%, 12.5%, 12.5%, and 0% of those receiving placebo, respectively (p = 0.08, 0.009, 0.02; p = non-significant).
SPARCC MRI spinal inflammation scores
There were no significant treatment group differences in mean SPARCC MRI 6 DVU or 23 DVU scores at baseline (Tables 2 and 3). No spinal inflammation was evident at baseline in 2 patients, both of whom received IFX, as confirmed by all 3 readers. Five patients (4 IFX, 1 placebo) had no spinal inflammation at baseline as recorded by at least 1 reader. The 6 DVU score was equivalent to the 23 DVU score in 65.6% of readers and would have identified the entire inflammation in the anterior spine in 21 of the 32 patients. The 23 DVU score was greater than the 6 DVU score by a mean (SD) of 5.8 (12.4) and a median (range) of 0 (0–55).
At Week 12, patients treated with IFX experienced a significantly (p ≤ 0.002) greater reduction in the mean spinal 6 DVU and 23 DVU SPARCC score, compared with patients taking placebo, as recorded by all 3 readers. A reduction in score was recorded by each of the 3 readers in every patient treated with IFX where inflammation was recorded at baseline. A significant reduction in 6 DVU score was also recorded in ASAS20 nonresponders when patients treated with IFX [n = 7; mean (SD) change in 6 DVU score = −14.4 (13.2)] were compared to patients taking placebo [n = 10; mean (SD) change in 6 DVU score = +1.7 (4.8); p = 0.02)]. Patients treated with IFX experienced a 55.1% mean reduction in the 6 DVU SPARCC score, compared with a 5.8% mean increase in score for patients treated with placebo (p < 0.001) at Week 12. A mean reduction of 57.2% was recorded for the 23 DVU score in patients treated with IFX compared to a mean reduction of 3.4% in patients given placebo (p < 0.001). Readers agreed that 6/16 (37.5%) patients who received IFX still had residual inflammation on MRI when all 23 spinal units were assessed.
Guyatt’s effect size for the change in SPARCC MRI spinal inflammation scores at Week 12 was well over 1 for both the 6 DVU and 23 DVU scores by each of the 3 readers indicating a large treatment effect by IFX in reducing spinal inflammation (Table 4). A substantially greater effect size was noted for the 23 DVU method by 1 reader, although the second reader recorded the converse.
Reliability of SPARCC 6 DVU and 23 DVU scoring methods
The SPARCC 6 DVU and 23 DVU scoring systems demonstrated excellent and comparable reproducibility (Table 5). In particular, interreader ICC for the change in spinal inflammation from baseline to 12 weeks was 0.89 and 0.93 for the 6 DVU and 23 DVU scores, respectively, when recorded by the same 2 readers.
Association between SPARCC scores and clinical data
A reduction of acute-phase reactants from baseline to 12 weeks was significantly associated with change in spinal inflammation recorded by MRI, particularly when the 6 DVU method was used (Table 6). No significant association was noted when change in other clinical measurements was analyzed, in particular the BASDAI.
DISCUSSION
We show that low-dose IFX therapy is highly effective in reducing spinal inflammation observed on MRI. This effect was recorded in all patients who had inflammation at baseline. We also show that methods for quantifying spinal inflammation by MRI are equally and highly discriminatory, whether assessment is limited to only the most severely affected regions of the spine (6 DVU score) or includes the entire spine (23 DVU score). The data also reflect the discordance between MRI-defined and clinically defined outcome measures, which has been observed4,19.
The degree of reduction in spinal inflammation using the SPARCC method is comparable to that reported for adalimumab at 12 weeks, with reduction from baseline scores of 54% compared to an increase in the placebo group of 9% using the SPARCC 6 DVU method19. This is similar to reports assessing patients on higher doses of IFX (5 mg/kg) but using a different scoring method and assessing patients at 24 weeks20. A decrease in spinal inflammation has also been reported with etanercept after 24 weeks, but the largest study of etanercept included MRI of only the lumbar spine and lower thoracic spine and MRI was limited to only 40 patients21. Open-label extension of these pivotal trials has shown that the reduction in spinal inflammation is persistent19,20.
We have reported the lack of association with clinical measurements of disease activity but have consistently shown significant correlations with acute-phase reactants4,19,22. In particular, change in MRI acute lesions correlated significantly with change in CRP in patients receiving adalimumab among 82 patients recruited to a placebo-controlled trial19. A report that examined the effects of IFX in a 20-patient subgroup analysis of the ASSERT trial described a correlation between change in BASDAI score and change in acute lesion score as measured by the ASspiMRI method3, but did not report correlations with acute-phase reactants. A second report of 20 patients that formed a subgroup from another controlled trial of IFX demonstrated only borderline significance for the correlation with the BASDAI after 12 weeks of treatment and no correlation with either clinical or laboratory measurements after 2 years of followup23. Similarly, improvement of the MRI scores did not show a significant correlation with changes in clinical or laboratory measurements in 2 controlled trials of etanercept, although the analyses were limited to a subgroup of only 40 patients in the first study21 and only 15 patients in the second24. In addition, the MR scan of the spine was limited to the lower thoracic and lumbar regions in the former study21.
Our data therefore suggest that symptoms are unrelated to the presence of bone marrow edema in the vertebral bodies, which is the primary feature assessed by MRI in currently used scoring methods. Other sources of symptoms may relate to inflammation in other structures such as the posterior elements and entheses, or noninflammatory causes of pain, such as secondary mechanical factors. These data also show that MRI is highly sensitive in the detection of inflammation, and the lack of sensitivity of acute-phase reactants and their relatively modest correlation with MRI indicates that they cannot be substituted for MRI examination. This in turn brings into question the clinical significance of MRI and whether the lesions observed on MRI reflect disease process and clinical outcomes. This is highlighted in the comparison of the MRI changes in the responder and nonresponder groups in our study. Limited histopathological data have shown that lesions observed on STIR sequences correlate modestly with histopathological grades of inflammation25. Moreover, the presence of increased STIR signal at vertebral corners is associated with the subsequent development of syndesmophytes on radiography after 2 years of followup26. Consequently, assessment of spinal inflammation using MRI should continue to be a requirement for the development of new therapeutics for AS.
Our comparative analysis of the 6 DVU and 23 DVU methods has also demonstrated equally high interobserver reliability and the ability to discriminate between treatment groups. This consistency in reliability and responsiveness between readers has been reported4, and is particularly noteworthy because this was observed in readers with no prior experience with any MRI-based scoring system2. The high reliability may be attributed to the dichotomous scheme for recording bone marrow edema, a clear definition of what constitutes abnormal STIR signal, and the availability of reference images that visually outline the measurements of an abnormal STIR signal. This clearly outweighs any potential for diminished reliability that might be anticipated by the requirement to select the 6 most severely affected DVU. Conversely, the requirement to score all 23 DVU mandates the scoring of regions that may be questionable with respect to size and presence of increased STIR signal5. In addition, certain MRI artifacts may resemble increased STIR signal. Phase-encoding artifacts due to blood flow in the major vessels of the abdomen is a common source of misinterpretation with inflammation in the anterior portion of the lumbar vertebral bodies. The high responsiveness and ability to discriminate between treatment groups reflects both the efficacy of treatment on local inflammation and the approach taken by the SPARCC method to record the extent of the lesion in 3 dimensions by requiring assessment of the lesion in consecutive sagittal slices. This capability is unique to MRI among imaging modalities. Moreover, the SPARCC method also applies an added weighting for the depth and intensity of the lesion. The minimal difference between the 6 DVU and 23 DVU methods may reflect the observation that scoring 6 DVU identifies all the detectable inflammation in the anterior spine in two-thirds of patients. The smaller lesions that might be present in the remaining DVU are likely to demonstrate much less relative change following treatment. Since scoring all 23 DVU is time-consuming, particularly when evaluating questionable lesions, we continue to recommend the scoring of only the most severely affected spinal regions when using the SPARCC MRI method to assess therapeutics in clinical trials. The assessment of all 23 DVU may be more appropriate in observational cohort studies, where questions beyond therapeutic efficacy are being addressed. These data indicate that the SPARCC 23 DVU method can be used reliably for observational studies and demonstrates a high degree of responsiveness.
We have shown that low-dose IFX is highly effective in reducing spinal inflammation recorded by MRI. Inflammation does not need to be scored in the entire spine using MRI. Discrimination between treatment groups and reliability between observers is equally high when scoring focuses on only the most severely affected regions of the spine. Feasibility is also improved by the use of appropriate viewing conditions and the simultaneous availability of electronic data entry on user-friendly schematic Web pages designed to depict the anatomical region scored on MRI.
Footnotes
-
Supported by Schering-Plough Canada. Dr. Maksymowych is a Scientist of the Alberta Heritage Foundation for Medical Research.
- Accepted for publication February 12, 2010.