Abstract
Objective. To evaluate the influence of inclusion criteria used in rheumatoid arthritis (RA) trials with adalimumab on clinical outcome and response.
Methods. The different inclusion criteria of published trials of adalimumab in RA were separately applied to a large prospective cohort of patients with RA treated with adalimumab (AdRA cohort), thereby mimicking patient selection for a clinical trial. Clinical response and outcome in the resulting 11 projection groups were compared using the 28-joint Disease Activity Score (DAS28) and time-averaged DAS28 as outcome measures of efficacy.
Results. Thirteen trials (n = 54–799) with 11 different sets of entry criteria were identified, resulting in 11 projection groups (n = 22–168). The DAS28 at baseline was similar in the original trial and each projection group based on this trial (5.1–6.4, total AdRA cohort 5.1). After 28 weeks, the efficacy varied substantially among the 11 projected groups (change from baseline DAS28: −1.65 to −2.65, time-averaged DAS28 3.67–4.53). Expressed as outcome (DAS28 at 28 weeks), the efficacy was much more similar for almost all projection groups (3.5–4.0) and thus appeared to be mostly independent of disease activity at baseline.
Conclusion. We observed that different inclusion criteria for clinical trials can have a marked effect on the expected response, i.e., improvement from baseline. A novel finding is that final disease activity appeared much less dependent on initial disease activity. Our study suggests that for daily practice, one can assume that adalimumab treatment will on average result in a DAS28 between 3.5 and 4.0 after 28 weeks of treatment, regardless of baseline disease activity.
There have been many efforts to standardize the classification and clinical outcome of rheumatic diseases1. Efficacy in clinical trials is measured with standardized disease activity scales based upon a core set of disease activity standards. The American College of Rheumatology (ACR) and the European League Against Rheumatism (EULAR) response criteria and the Disease Activity Score (DAS) response2,3,4,5,6,7 are widely used. Recently, more attention is being paid to minimal disease activity state or remission as an important outcome8. Trials on the efficacy of adalimumab in patients with RA show different outcomes9,10. These differences in efficacy between trials may be caused by differences in patients, geographic location, and ethnicity. Another important source of differences in outcome may be the use of different inclusion criteria11.
Many different inclusion criteria are used in clinical trials on adalimumab in RA. However, the effect of inclusion criteria on clinical outcome and clinical response is not completely clear. Zink, et al12 reported that patients in randomized controlled trials (RCT) are not the same as those seen in daily clinical practice and that clinical outcomes also differ between RCT and daily practice. Zink, et al compared characteristics of eligible versus ineligible patients in trials of various biologic agents and reported that patients ineligible for the trials benefit less from treatment than the eligible patients benefitted in the original trials. In our study, we compare patients who would be eligible based on different sets of inclusion criteria, focusing on adalimumab trials only. In our cohort we mainly use the DAS28 as an outcome measure.
The objective of the current investigation is 2-fold. The first aim is to assess the influence of the use of different inclusion criteria on efficacy. This is done by “projecting” the inclusion criteria used in clinical trials on a cohort of patients with RA treated with adalimumab in daily clinical practice. Second, we compared the outcome of the clinical trials to the actual outcome in daily clinical practice by comparing the efficacy reported in the original trial with the efficacy found in the projection group using the same inclusion criteria.
MATERIALS AND METHODS
A literature search was performed using Medline and the Cochrane library. Articles reporting clinical trials on adalimumab treatment in patients with RA were selected from 1997 until September 2006. Search terms were “adalimumab,” “Humira,” “anti-tumour necrosis factor,” “rheumatoid arthritis,” and “RA”. Clinical outcome measurements had to include either a DAS/DAS28 score2,4,5,6, EULAR response criteria, or an ACR response (ACR20, ACR50, ACR70, or ACR-N13,14).
Articles were selected if the inclusion criteria for the study were clearly stated. If different trials from similar authors used identical inclusion criteria, those trials were considered 1 group.
The groups of inclusion criteria were sorted on strictness, ranging from light to strict. The strictness of the inclusion criteria is based upon an arbitrary scale determined by the effect of the separate variables on the DAS28 score and the ACR core set of disease activity measurements at baseline. Subsequently, 2 experienced rheumatologists decided on the order of the trials that were later classified as having light, medium, and strict inclusion criteria, based on the DAS28 score as well as the number of swollen and tender joints.
To evaluate the influence of different inclusion criteria on clinical outcome and response without the confounding effect of differences between study populations in clinical trials, the projection method was used11. This method mimics the patient selection process by projecting the set of inclusion criteria onto a clinical practice cohort of a large outpatient clinic. This results in 1 projection group per set of inclusion criteria, containing the patients that would have been eligible for a trial using those criteria. This procedure is performed for all sets of inclusion criteria. Because the patients selected for each of the projection groups originate from the same general cohort, differences in outcome that are found between the projection groups is most likely caused by differences in inclusion criteria.
Our study is based on a prospective observational cohort of consecutive patients with RA who were treated with adalimumab therapy at the Department of Rheumatology of the Jan van Breemen Institute: the AdRA cohort. All patients with at least 28 weeks of followup were included. The patients had active disease at the time of inclusion, defined as a DAS28 > 3.2 despite earlier treatment with at least 2 disease-modifying antirheumatic drugs (DMARD), including methotrexate (MTX), at a dosage of 25 mg weekly or at the maximum tolerable dosage. All patients received adalimumab 40 mg subcutaneously every other week. Patients were excluded if they had contraindications for adalimumab use, as specified in the guidelines of the European Medicines Agency15. The medical ethics committee at the Slotervaart Hospital in Amsterdam, The Netherlands, approved of our study, which was carried out in compliance with the Helsinki Declaration.
We used 2 different outcome measures: the DAS28 and the DAS28 AUC (area under the curve). A third outcome measurement, the ACR response, is used to compare the original trial, the projected trial, and the total AdRA cohort if used in the published original trial.
Primary outcome measures are the change in DAS28 from 0 to 28 weeks and the DAS28 AUC. The DAS28 AUC is not often used. It measures the patient’s cumulative disease activity over time
Statistical analysis
The decrease in DAS28 score within each projection group from baseline to 28 weeks is analyzed by a paired sample t-test. An additional independent samples t-test was done to compare the decrease in DAS28 score of the patients in the AdRA cohort who had a higher DAS28 at baseline (DAS28 > 5.1) versus the other half, who had a lower DAS28 at baseline (DAS28 < 5.1).
For the comparisons between the projection groups it is important to realize that the projection groups overlap, i.e., they (partially) contain the same patients. Therefore, to assess the statistical significance of the difference in outcome between the projection groups, a nonparametric bootstrapping approach was used18. From the total AdRA cohort a random sample is drawn with replacement and the procedure of projecting the criteria is repeated using this sample. The projection groups thus obtained are used to calculate the difference in outcome measure. This sampling procedure is repeated n times, yielding a vector of differences of length n. Based on this vector, the mean difference is calculated, and a confidence interval is determined nonparametrically by calculating quantiles19. In our study, n was chosen to be 100,000. To reduce the risk of increasing type I errors due to multiple testing, we calculated a 99% CI to test for statistical significance instead of 95%. Further, we reduced the number of comparisons by choosing 3 trials (1 randomly selected trial within the strict criteria group of trials, 1 within the light criteria group, and 1 trial in the medium group) to be compared to each other and the AdRA cohort, instead of comparing all trials to each other.
In addition, to compare the efficacy reported in the trial with the efficacy obtained in clinical practice, we compared the outcome of the trial with the outcome in the projection group and the outcome in the total AdRA cohort. This is performed for 2 trials, 1 with light inclusion criteria and 2 with strict inclusion criteria. For the comparison, we used the efficacy outcome used in the original trial (DAS, DAS28, or ACR response).
RESULTS
Until September 2006, 13 eligible RA trials were published9,10,20,21,22,23,24,25,26,27,28,29,30. This resulted in 11 different trial groups of inclusion criteria. One trial9 was excluded because it was a continuation of a primary trial already included and no changes in patient population were noted26, and in another study the continued trial29 included more patients than the original trial30.
The main differences in inclusion criteria were based on differences in DAS score, tender joint count, swollen joint count, C-reactive protein, erythrocyte sedimentation rate, morning stiffness, previous DMARD used (including MTX and previous biological agents), rheumatoid factor, and erosions present.
In Table 1 the most important inclusion criteria of the different trials are shown. The trial groups are ordered according to the strictness of the inclusion criteria, increasing in strictness from left to right. All patients had RA according to the 1987 ACR criteria.
From 2004 to 2006, 237 patients were included in the AdRA cohort, and for our study all 168 patients were included who were treated for at least 28 weeks. The baseline characteristics of the population are presented in the first column of Table 2. The mean age of the patients was 56 years and there were more women than men in the cohort. The mean DAS28 at baseline is 5.1, which represents a high disease activity.
The projection of the 11 sets of inclusion criteria onto the AdRA cohort resulted in 11 projection groups (trial A to trial K). The baseline characteristics of the patients in those groups are presented in Table 2. All groups had similar percentages of women (78%–87%) and whites (78%–85%), and the mean disease duration (12–16 years) and mean age (56–58 years) were also similar.
As the inclusion criteria become more stringent, the number of patients eligible for inclusion decreases. We found that all 168 patients of the AdRA cohort would be eligible according to the criteria of trial A. When applying the strict criteria of projection group K, only 22 AdRA patients could be included. At baseline, the DAS score differs from 5.20 in trial C (which has light inclusion criteria) to 6.26 in trial F (with medium criteria) to 6.38 in trial K, with the most stringent inclusion criteria.
Table 3 shows the efficacy outcomes of each of the 11 projection groups: the differences in DAS28 and time-averaged DAS28 from baseline until 28 weeks of treatment. The mean DAS28 baseline varies between 5.1 and 6.4, while the mean DAS28 at 28 weeks shows remarkably less variation, ranging from 3.5 to 4.0. The DAS28 AUC ranged from 103 to 127, and the time-averaged DAS28 ranged from 3.67 to 4.53. If we divide the AdRA cohort into 2 subgroups based on baseline DAS28 (cutpoint 5.1), those at or below the cutpoint (mean DAS28 4.1) decreased to 3.0 at 28 weeks, while those above the cutpoint (mean DAS28 6.1) decreased to 4.0. These results are similar to the results seen in the projection groups with comparable initial DAS28 score (Table 3).
For the statistical comparison of the outcomes in the projection groups we used the AdRA cohort and every third group thereafter (C, F, and I). We found that all patients in projection group F were also included in C (but not vice versa). When we compared the projection groups of trials F and I, 27 patients were eligible for F but not for I, 30 for both F and I, and 6 for I but not for F. This overlap necessitates the use of the bootstrapping approach. The groups F and I had a significantly higher DAS28 AUC than group C and the total AdRA cohort. The differences in DAS AUC between the projection groups F and I are not significant, which also holds for the difference between group C and the total AdRA cohort (Table 4). Hence, trial C and the total cohort can be considered as containing the less severely ill patients, while trials F and I seem to focus on the more severely ill patients with RA.
To illustrate differences between the efficacy outcomes observed in the trial and the outcome results obtained in daily clinical practice, we compared outcome of a light and strict inclusion criteria trial with the outcome in their projection groups and the outcome in the entire AdRA cohort (Figures 1 and 2). We used the outcome measure as used in the original trial as a basis for comparison.
Figure 1 shows the comparison between the original trial, that is, Barrera, et al22, the projection group C (the AdRA projection group based on the Barrera inclusion criteria), and the entire AdRA cohort. The inclusion criteria of this trial were classified as light. There is a small difference between the 3 curves in Figure 1. The original trial reaches a lower DAS28 over time than the projection group and the AdRA cohort. The lines of projection group C and the AdRA cohort are nearly identical, a finding in accord with the findings presented in Tables 2 and 4.
Figure 2 shows the comparison between the original trial (van de Putte, et al29), the projection group K, and the AdRA cohort. In contrast to the previous one, this trial used strict inclusion criteria. Figure 2 shows that the ACR20, 50, and 70 values over time of projection group K are much higher than in the original trial and in the total AdRA cohort.
DISCUSSION
Our simulation study shows that different levels of initial disease activity used as inclusion criteria lead to different levels of clinical response, i.e., improvement of DAS28 from baseline. A novel finding is the effect of adalimumab expressed as outcome, i.e., final disease activity appeared mostly independent of initial disease activity. Thus the differences in DAS28 improvement can mostly be attributed to differences in DAS28 scores at the start of treatment.
Further, there might also be differences between the outcome of the original trial, the outcome of projection of that trial in the AdRA cohort, and the whole AdRA cohort (Figures 1 and 2), which appear to depend on the strictness of the inclusion criteria. However, the statistical significance of these differences could not be assessed because we did not have detailed patient information from the original trial. For the comparison we used the efficacy outcome from the original trial (DAS28 or ACR response). This comparison is valid because treatment is the same in the AdRA cohort, and thus in the projection group of the clinical trial and in the trials themselves.
A limitation of our study is that some projection groups, such as projection group K (n = 22), have a low number of patients. This may be an explanation of the large differences in Figure 2.
In most clinical trials, the ACR response is based on a 66/68-joint count. Since we based the ACR response on a 28-joint count, the percentage change in tender and swollen joint count more rapidly rises or falls with the increase or decrease of severity of illness by one joint, which results in a more rapid increase or decrease in ACR response. This could be another reason why the results of projection group K in Figure 2 are much higher than in the original trial.
Our study shows that different inclusion criteria for clinical trials can have a marked effect on the expected response, i.e., improvement from baseline. This may lead to greater or less clinical response of patients in daily clinical practice. However, the importance of this finding may diminish if our results can be confirmed that in the end, final disease activity appeared to be much less dependent on initial disease activity. Most patients reached a similar level of disease activity, regardless of their initial disease activity. In this case the clinician could turn to (or calculate) the end result of clinical studies to aid treatment decisions in daily clinical practice.
This novel finding suggests that for daily practice one can assume that adalimumab treatment will, on average, result in a DAS28 between 3.5 and 4.0 after 28 weeks of treatment, regardless of baseline disease activity.
Footnotes
-
Supported by Abbott Laboratories.
- Accepted for publication April 5, 2011.