Abstract
Objective. To compare the efficacy of 6 tumor necrosis factor–α inhibitors (TNFi) in treatment of ankylosing spondylitis (AS) at 12 weeks and 24 weeks.
Methods. We performed a systematic literature review of randomized controlled trials of TNFi in patients with active AS. We included trials that reported efficacy at 10 to 14 weeks (12–week analysis) and at 24 to 30 weeks (24-week analysis). We used Bayesian network metaanalysis (NMA) to compare their relative efficacy to improve the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), Bath Ankylosing Spondylitis Functional Index (BASFI), and C-reactive protein (CRP) level.
Results. We included 20 trials of 6 TNFi, with 43 treatment arms and 3220 participants. All TNFi were significantly better than placebo in reducing BASDAI and BASFI at 12 weeks and 24 weeks; all but certolizumab pegol (CZP) were statistically better than placebo in reducing CRP at 12 weeks; all but CZP and infliximab-dyyb (IFX biosimilar) were significantly better than placebo in reducing CRP at 24 weeks. IFX was superior to the other TNFi in decreasing BASDAI at 12 weeks, but not at 24 weeks. Excluding 1 open-label trial, there were no differences among TNFi.
Conclusion. Based on this NMA of clinical trials, IFX was superior to other TNFi in reducing BASDAI at 12 weeks, but sensitive to inclusion of an open-label trial, and its efficacy was diminished at 24 weeks. The analysis was limited by few direct comparison trials. Further study of relative safety and longterm effectiveness will help inform the choice of TNFi in treating active AS.
Tumor necrosis factor-α inhibitors (TNFi) have been widely used as a second-line therapy when patients with ankylosing spondylitis (AS) have persistent symptoms despite treatment with nonsteroidal antiinflammatory drugs (NSAID)1. Six different TNFi have been approved for the treatment of AS, including adalimumab (ADA), certolizumab pegol (CZP), etanercept (ETN), golimumab (GOL), infliximab (IFX), and IFX-dyyb (IFX biosimilar). Although they share the same mechanism of action, they are structurally different and have varying efficacy in other conditions in the spondyloarthritis family, including uveitis and inflammatory bowel disease2. It remains unclear whether all TNFi are equally efficacious in relieving the symptoms and signs of active AS. In clinical practice, physicians and patients may favor a particular TNFi over others based on convenience, comorbidities, or cost, rather than a comparison of relative efficacy.
To date, only 2 head-to-head trials of TNFi in AS have been conducted: 1 of ETN versus IFX, and the other of IFX versus IFX-dyyb3,4. In the absence of direct comparisons, indirect comparisons of ≥ 2 medications can be made through a common comparator using network metaanalysis (NMA). Previous NMA that used the Assessments in Ankylosing Spondylitis 20% response criteria (ASAS20) as the outcome did not detect any difference in efficacy among TNFi5,6,7,8. However, dichotomous measures such as the ASAS20 are less sensitive than continuous measures in detecting a difference among medications, in part because such measures ignore any differences in efficacy beyond the ASAS20 threshold9. In our study, we used 3 continuous measures: Bath Ankylosing Spondylitis Disease Activity Index (BASDAI)10, Bath Ankylosing Spondylitis Functional Index (BASFI)11, and C-reactive protein (CRP) level as primary outcomes to compare the relative efficacy of 6 TNFi in treatment of active AS.
MATERIALS AND METHODS
Literature search
The study protocol was registered at PROSPERO (registration number CRD42014014228). We searched PubMed, EMBASE, Scopus, and the Cochrane Database for published randomized controlled trials (RCT) of TNFi in AS through March 31, 2016, in all languages. Searches were performed by a medical informatician, and search terms are summarized in Supplementary Table 1 (available with the online version of this article). We further manually searched reference lists of review articles. Two authors (RW and MMW) reviewed the search results for eligible studies based on selection criteria. Disagreement was resolved by discussion.
Our study was exempted from ethics review by the US National Institutes of Health Office of Human Subjects Research.
Selection criteria
We included RCT that evaluated the efficacy of TNFi in adult patients with AS, compared to placebo or to a different TNFi, at 10 to 16 weeks, or at 24 to 30 weeks. AS was defined in the trials by the modified New York criteria12. To enhance homogeneity, we excluded studies of axial spondyloarthritis, unless a subgroup analysis of patients with AS was reported. TNFi include ADA, CZP, ETN, GOL, IFX, and IFX-dyyb. We included studies irrespective of whether they allowed concomitant use of NSAID, corticosteroids, and disease-modifying antirheumatic drugs (DMARD). We excluded studies that were reported only as an abstract.
Data extraction and assessment of bias
Data extraction was performed independently by 2 reviewers (RW and MMW). Any discrepancies were resolved by discussion. We extracted features of the study design, characteristics of participants, and relevant outcome measures. The primary efficacy measures were changes from baseline in the BASDAI, BASFI, and CRP. We extracted the mean change score and its SD, or calculated the change from baseline and final scores. When only medians and ranges were reported, we imputed means and SD using standard methods13. Authors of the original articles or study sponsors were contacted for additional data when needed. Missing SD were imputed using SD of other included studies13. Intention-to-treat data were collected whenever available.
To assess study quality, we used the Cochrane Collaboration tool for assessment of risk of bias13. Each study was evaluated on 6 domains (i.e., sequence generation, allocation concealment, blinding, incomplete outcome data, selective reporting, and other sources of bias), and rated as low, unclear, or high risk.
Statistical analysis
We performed Bayesian NMA to quantify the relative efficacy of each drug using a random effects model under the assumption of consistency. Bayesian NMA allows the indirect comparison of 2 drugs based on the observed direct effects. For example, the relative effect of drug A and B is the difference of the relative effects of drug A and C and the relative effects of drug B and C, if these direct comparisons are available. We grouped studies that reported outcomes at 10 to 16 weeks for the 12-week analysis, and studies that reported outcomes at 24 to 30 weeks for the 24-week analysis. Bayesian NMA was performed for each outcome at these 2 timepoints. The relative effect size was presented as the mean difference (MD) with 95% credible intervals (CrI). Outcomes of open-label trials may differ from those of blinded trials; on the other hand, open-label trials closely mimic the real-life experience. Therefore, we performed 2 analyses: the first included a single open-label study, and a second analysis excluded this single open-label study.
We assessed the absolute model fit by the overall residual deviance (Dbar)14. Dbar of each drug should approximate the total number of trial arms included in the metaanalysis when the model fits the data well. We assessed heterogeneity among the trial results using Higgins I2, which measures the percent of variability in effect estimates that is a result of heterogeneity rather than sampling error15. Lower I2 indicates less heterogeneity. To estimate the effect of heterogeneity as a result of differences in initial AS activity among trial participants, we performed metaregression that adjusted for the weighted mean baseline value of each outcome; the BASDAI and BASFI analyses were also adjusted for mean baseline CRP. Because TNFi trials were performed over a span of 15 years, we examined whether there was a drift in placebo responses over time (owing to possibly greater expectations of benefit in later trials), which could affect the direct and indirect comparisons among TNFi.
All analyses were performed using R (version 3.3.1), the R package gemtc (version 0.8.1), and the Markov Chain Monte Carlo engine JAGS (version 4.2)16,17,18.
RESULTS
Literature review
We identified 402 articles through a systematic literature search, and 3 additional articles from the reference lists of previous reviews. There were 20 studies included after applying the inclusion and exclusion criteria, consisting of 18 placebo controlled trials and 2 head-to-head comparison trials. A flow diagram that illustrates the study selection process is in Supplementary Figure 1 (available with the online version of this article). A total of 43 trial arms and 3220 patients were included. A summary of study characteristics is presented in Table 13,4,15–32. The sample sizes ranged from 40 to 356. Mean age of study participants ranged from 27.4 to 48.0 years, and the mean durations of AS were from 6.8 to 23.0 years. The range of patients who were HLA-B27–positive was 72%–96.2%. Mean baseline BASDAI scores ranged from 5.5 to 6.9 cm (possible range 0–10) on the visual analog scale, mean baseline BASFI scores ranged from 3.2 to 6.7 cm (possible range 0–10) on the visual analog scale, and mean baseline CRP values were from 11 mg/l to 33 mg/l. Fourteen studies reported concomitant use of DMARD, including methotrexate (MTX). Five studies did not permit concomitant use of DMARD, all of which were trials of IFX. One study did not report information on use of DMARD.
The overall study quality was moderate to high (Supplementary Figure 2, available with the online version of this article). One study (5%) was an open-label trial, and therefore was graded as high risk for bias in blinding of participants, personnel, and outcome assessment. Five studies (25%) had high risk of bias as a result of selective reporting because of missing data, but most were provided by trial investigators or sponsors on inquiry. One study (5%) reported covariate-adjusted mean values instead of raw means, and was considered unclear risk for other bias.
Networks of evidence and comparison to placebo
Eighteen trials (including 39 arms, 2900 participants) were included in the analysis of relative efficacy at 12 weeks. The networks of comparisons for the BASDAI, BASFI, and CRP are presented in Figure 1A–C. Model fit was good (Supplementary Table 2, available with the online version of this article). All TNFi were significantly more efficacious than placebo in reducing BASDAI and BASFI scores [relative effect size for BASDAI reduction range from −2.66 to −1.45 mean difference (MD); for BASFI reduction from −1.99 to −1.05 MD], and all but CZP were significantly better than placebo in decreasing CRP (relative effect size from −1.57 to −0.71 MD; Figure 2). CRP results were not available for IFX-dyyb at 12 weeks. I2 values were 2.25, 4.53, and 7.46 for the BASDAI, BASFI, and CRP models, respectively, indicating low heterogeneity.
Eleven trials (including 24 arms, 2083 participants) were included in the analysis of relative efficacy at 24 weeks. The network of each comparison is presented in Supplementary Figure 3A–C (available with the online version of this article). The fit of the 24-week models was also good (Supplementary Table 2). All TNFi were significantly more efficacious than placebo in reducing BASDAI and BASFI scores (relative effect sizes for BASDAI range from −3.04 to −1.48 MD, for BASFI from −1.96 to −1.23 MD), and all TNFi except CZP and IFX-dyyb were superior to placebo in decreasing CRP (relative effect size from −1.30 to −0.69 MD; Figure 3). I2 values were 7.39, 2.64, and 2.35 for the BASDAI, BASFI, and CRP models, respectively, again indicating low heterogeneity.
There was no evidence of substantial drift in placebo responses over calendar time (data not shown).
Comparisons among TNFi at 12 weeks
We conducted 2 analyses for paired comparison between TNFi at 12 weeks: 1 that included a single open-label study, and a second analysis that excluded this single open-label study. In the analysis that included the open-label study, IFX was significantly more efficacious in reducing BASDAI than ADA (relative effect size −1.1 MD 95% CrI −2 to −0.1), CZP (relative effect size −1.2 MD 95% CrI −2.3 to −0.02), ETN (relative effect size −1.2 MD 95% CrI −1.8 to −0.4), and GOL (relative effect size −1.1 MD 95% CrI −2 to −0.1; Table 2A, below diagonal). IFX was also significantly better in reducing BASFI than CZP (relative effect size −1.0 MD 95% CrI −1.7 to −0.03; Table 2B, below diagonal). However, there were no significant differences among TNFi in the paired comparison of changes in CRP at 12 weeks (Table 2C, below diagonal). Biosimilar IFX-dyyb had MD similar to that of IFX in reducing BASDAI, consistent with the result of the head-to-head trial between the 2 drugs. However, it had wider CrI, likely because it was assessed in only 1 trial.
In the analysis that excluded the open-label trial, IFX was not more efficacious than other TNFi in decreasing BASDAI (Table 2A–C, above diagonals).
In the metaregression model, when adjusted for baseline BASDAI and baseline CRP, IFX remained superior to CZP, ADA, and ETN in reducing BASDAI (Supplementary Table 3, available with the online version of this article). In addition, IFX-dyyb was significantly more efficacious than ETN in reducing BASDAI. When adjusted for baseline BASFI and baseline CRP, IFX was superior to CZP and ETN in BASFI reduction. No significant difference was detected in CRP changes.
Comparisons among TNFi at 24 weeks
The advantage of IFX seen at 12 weeks was not present in the 24-week analysis. At 24 weeks, no TNFi was significantly more efficacious than others in reducing BASDAI, BASFI, or CRP (Tables 3A–C). IFX-dyyb had numerically a higher reduction in BASDAI and BASFI compared to other TNFi, and ADA had a numerically higher reduction in CRP compared to other TNFi. However, these comparisons were not statistically significant.
In the metaregression model, when adjusted for baseline BASDAI and CRP, IFX-dyyb remained numerically better than other TNFi in reducing BASDAI (Supplementary Table 4, available with the online version of this article). When adjusted for baseline BASFI and baseline CRP, ADA was numerically better than other TNFi in BASFI and CRP reduction. None of these differences were statistically significant.
DISCUSSION
In our systematic review and Bayesian NMA, we compared the relative efficacy of 6 different TNFi in the treatment of active AS, using BASDAI, BASFI, and CRP as outcome measures. We found that at 12 weeks, IFX was superior to ADA, CZP, ETN, and GOL in reducing BASDAI, and superior to CZP in reducing BASFI. These differences persisted in analyses that adjusted for baseline values of BASDAI, BASFI, and CRP, indicating that differences among trials in the activity of AS at enrollment did not account for this association. Qualitatively similar results were present for IFX-dyyb. We did not find differences among TNFi other than IFX in these outcomes, and found no differences among TNFi in reducing CRP levels. Responses were not different among TNFi in the 24-week analysis. We did not find differences among TNFi when we excluded an open-label trial.
The apparent earlier response to IFX and IFX-dyyb than to other TNFi may relate to the use of loading doses or to their intravenous method of administration, both of which are unique to these 2 medications. However, the advantage of IFX at 12 weeks in BASDAI responses should be interpreted cautiously. A statistically significant difference does not necessarily translate into a clinically important difference. Further, given that AS is a chronic condition, the early symptom improvement may not be viewed as important as intermediate or longterm effects. Additionally, this early advantage was not evident in the analyses that excluded an open-label study, indicating sensitivity of this association to the results of Giardina, et al3, which directly compared IFX and ETN. This sensitivity reflects the influence of trials with direct head-to-head comparisons in NMA. However, these results may assist in the choice of TNFi in clinical situations when prompt symptom responses are needed, although these situations are rare in AS. A report on comparisons of biological agents in treatment of severely active ulcerative colitis also concluded that IFX was more effective than ADA in induction therapy14.
We attempted to address many potential biases. First, we chose continuous outcome measures rather than dichotomous outcomes, to maximize the potential to differentiate effects among medications. Second, in the process of data extraction, we contacted principal investigators and/or study sponsors for additional data, increasing the completeness of the dataset and decreasing the risk of bias as a result of incomplete reporting of data. Third, to address the heterogeneity among the studies, we tested several factors that could potentially influence the relative effects of different TNFi, including examination of changes in placebo responses over time, and estimation of associations with metaregression using baseline disease activity measures as covariates. In the analysis that excluded the open-label trial, IFX was numerically (but not statistically) more efficacious than other TNFi in reducing BASDAI, indicating that this aspect of study design may influence treatment effects on patient-reported outcomes. Given its low use in these trials, concomitant MTX was unlikely to influence the comparisons among TNFi.
The major limitation of our study is the small number of head-to-head trials. Two such trials were identified, and we could construct only 1 closed loop in the evidence network with both direct and indirect comparisons of 2 drugs available (IFX vs ETN). The other TNFi were compared either to placebo directly or through another TNFi, forming a star network. Because of this, we had to assume consistency in the analysis, which reduces confidence in the estimation19.
Our study showed that IFX was somewhat more efficacious in reducing the BASDAI than several other TNFi in the short term, but this advantage was sensitive to the inclusion of an open-label trial and diminished at 24 weeks. IFX (or IFX-dyyb, which had similar effects) may therefore be conditionally preferred in the uncommon case in which a prompt symptom response is needed. The choice of TNFi in patients with AS may also be guided by the presence of specific comorbid conditions, such as inflammatory bowel disease or recurrent iritis. Without these considerations, more information on the relative safety and longterm effectiveness of TNFi will provide critical guidance on the choice of TNFi in the treatment of AS in clinical practice.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
The authors thank study investigators, Amgen, Pfizer, and UCB for providing additional data upon request.
Footnotes
Funding provided by Intramural Research Program, NIAMS, and NIH. RW is a recipient of the Rheumatology Research Foundation Scientist Development Award. The content is the responsibility of the authors and does not necessarily represent the official views of the NIH. Portions of the data were obtained from the Pfizer Clinical Data Set through a data-use agreement, and from Amgen Inc. through a data-sharing agreement. In addition, the study, carried out under YODA Project #2014-0291, used data obtained from the Yale University Open Data Access Project, which has an agreement with Janssen Research & Development LLC. The interpretation and reporting of research using these data are solely the responsibility of the authors and do not necessarily represent the official views of the Yale University Open Data Access Project or Janssen Research & Development LLC. Amgen Inc., Pfizer Inc., and the Yale Open Data Access Project were provided copies of the manuscript before submission.
- Accepted for publication September 29, 2017.