Threats to the Validity of Clinical Trials Employing Enrichment Strategies for Sample Selection

https://doi.org/10.1016/S0197-2456(97)00118-9Get rights and content

Abstract

Subject selection and exclusion criteria employed in typical clinical effectiveness trials of investigational new drugs have two fundamental aims: (1) to ensure that patients entering a study are truly suffering from the condition the drug is intended to treat and (2) to maximize the likelihood that the study will detect an effect of the drug if, in fact, one exists. Typical protocol selection criteria not only specify exacting procedures for establishing and documenting the diagnosis of those recruited for a study but also seek to increase, relative to the prevalence in the general population, the proportion of individuals in the sample likely to respond to pharmacological treatment. Because it is ordinarily impossible to learn prior to extensive clinical experience with a new drug which, if any, patient characteristics reliably predict a consistent treatment response, strategies for sample “enrichment” typically operate by excluding patients (for example, those with very advanced and/or complicated illness, those with serious concomitant illness, those at the extremes of age, those with very mild illness, and so forth) in whom a dependable response to treatment seems unlikely on logical and/or generic grounds.

Some studies use positive strategies for sample “enrichment.” In studies evaluating drugs intended to treat recurrent episodes of psychiatric illnesses, many protocols recommend selective recruitment of patients with a history of meaningful positive responses to antipsychotic treatment during prior episodes. Sample selection procedures of these kinds impose limits on the generalizability of a study's results (i.e., external validity), but the use of nonrandom patient samples is ordinarily held to have no effect on the internal validity of the results. In short, studies employing highly selected patient samples are, despite their limited external validity, regularly accepted as valid sources of evidence bearing on a drug's effectiveness.

There are exceptions, however; this paper describes one in which the use of a seemingly innocuous sample enrichment maneuver proved highly damaging to the ultimate credibility of an important multicenter trial. In particular, exposure to an experimental treatment during an open qualification phase may invalidate drug-placebo comparisons made during a later randomized, blinded, controlled phase. Our review of the trial also reveals that the enrichment maneuver employed probably failed to accomplish its intended aims, selecting patients whose improvements on the outcome variable may be as reasonably ascribed to chance as to drug effect. This is all the more surprising because the method of sample enrichment employed has much in common with those long recommended in the clinical trial literature.

Introduction

A new drug product may not be marketed legally in the United States until it is the subject of an approved New Drug Application (NDA). To gain approval, a sponsor must, among other requirements of the Federal Food, Drug, and Cosmetic Act (the Act), submit “substantial evidence” of the product's effectiveness to the Food and Drug Administration (FDA). Such evidence should allow an expert clinician to conclude that the drug will have the effects the sponsor claims it will have. To obtain the required evidence of effectiveness, sponsors typically conduct trials employing parallel group, randomized, placebo, and/or dose-comparison concurrent control designs. If such a clinical study detects a statistically significant difference that favors the investigational drug over the control condition on a valid measure of clinical outcome, the study is ordinarily accepted as a source contributing to the substantial-evidence requirement of the Act.

The Act, importantly, makes no demands in regard to the minimum size of a treatment effect or the minimum proportion of the patient population that must respond to a drug for it to be declared effective. In addition, no requirements are imposed regarding the “representativeness” of a study sample vis-à- vis the population of patients from which it is drawn. The typical commercial drug effectiveness trial, therefore, is not unlike what Schwartz and Lellouch [1]call an “explanatory” study. Its primary aim is to provide proof that a drug has a therapeutic effect in at least some patients. Such trials are contrasted with those of “pragmatic” design that seek to obtain valid estimates of the treatment's expected effect under conditions of actual use in the population.

Subject selection criteria employed in clinical trials of commercial drugs, therefore, are designed primarily to recruit cohorts containing patients who are cooperative, compliant, and likely to respond to the investigational drug if it is, in fact, efficacious. Thus, definitive effectiveness trials typically exclude patients with extremely mild or advanced disease, those at the extremes of age, those in poor health, and those with bad habits (such as smokers and those who use illicit drugs or consume excessive amounts of alcohol). These exclusions are acceptable because, despite the limitations they impose on the generalizability of a study's results, they ordinarily pose no threat to a study's internal validity. In sum, if a randomized controlled clinical trial conducted in a sample of patients reliably known to suffer from a disease detects a statistically significant between-treatment difference favorable to an investigational drug on a valid measure of clinical outcome, the study will ordinarily be accepted as a source of evidence documenting that the investigational drug has a beneficial effect in at least some patients in the population, provided that fraud and/or systematic bias can be reasonably excluded as alternative explanations for the difference [2].

Unfortunately, some methods of sample enrichment may undermine the internal validity of a study. This paper reviews some of the issues and illustrates them with findings from a large multicenter consortium trial [3]of the cholinesterase inhibitor tacrine as a treatment for dementia.

Section snippets

A brief history of the enrichment/rerandomization design and its applications

Although the execution of any set of sample selection criteria technically can be considered to constitute a sample enrichment maneuver, the label “enrichment design” typically is reserved for those clinical trial designs that select subjects for participation in a randomized comparison phase of a study on the basis of their prior response, often during a preliminary open titration phase of the same study, to one or more of the investigational treatments being evaluated. The first use of the

The tacrine consortium study

The Summers et al [13]report on tacrine's effects on dementia in 17 patients appeared in November 1986. Within days, the lay media heralded tacrine as being capable of doing for dementia patients what levodopa had done for individuals with Parkinson's disease—it was not a cure, but it was a breakthrough just short of a miracle. Consequently, the public placed enormous pressure on the FDA to release tacrine for early widespread use. In this milieu of heightened expectations, a consortium of

Concerns That the Treatment Blind May Have Been Broken

If an investigational drug has a set of characteristic untoward effects, there is often concern that the treatment blind may have been broken. In the typical randomized placebo-controlled trial, the risk of blind-breaking is lessened because each subject is exposed to only one of the treatments being evaluated. Designs that expose subjects to all the treatments being compared, in contrast, would appear to increase the risk that a subject will correctly guess his/her actual treatment assignment

Conclusions

Although their use has been advocated in the archival literature [e.g., 5, 7, 18], so-called enrichment or reradomization designs in which subjects are selected for entry into randomized, blinded, controlled phases of experiments on the basis of their prior response to the investigational drug in an open phase have limitations. In particular, exposure to an experimental treatment during an open qualification phase may invalidate drug-placebo comparisons made during a later randomized, blinded,

References (18)

There are more references available in the full text version of this article.

Cited by (92)

  • ROC curves and nonrandom data

    2017, Pattern Recognition Letters
    Citation Excerpt :

    Alternatively, one could discretize the classifier’s output and treat a as a latent variable (in the same manner as p). This paper’s results have implications for the practice of sample enrichment, which typically involves removing cases from the data [14]. In addition to possible effects from changes to the distribution of positive and negative cases (as discussed at the end of Section 2), if the way in which cases are removed is correlated with the cases’ propensity to be positive or with the classifier’s output, the empirical ROC curves constructed with that sample will be biased.

  • Early phase drug development for treatment of chronic pain - Options for clinical trial and program design

    2012, Contemporary Clinical Trials
    Citation Excerpt :

    Possibly, single-blind administration of the active drug in the initial enrichment phase could reduce bias in analgesic EE trials [19]. Possible carryover effects from the initial phase to the randomized phase should also be taken into account, e.g. drug withdrawal effects in those subjects that are randomized to withdrawal of the active drug [25]. Therefore, washout of the active treatment in patients randomized to discontinuation of the active drug may be considered [19].

  • Considerations for improving assay sensitivity in chronic pain clinical trials: IMMPACT recommendations

    2012, Pain
    Citation Excerpt :

    Recommendations were limited to characteristics of chronic pain RCTs in general and not factors unique to specific conditions, for example, glycemic control in trials of painful diabetic peripheral neuropathy (DPN). Considerations of the likelihood that potential modifications to RCTs might increase the rate of false-positive results [57,61] or study costs were generally based on expert opinion, given the lack of data addressing these critical issues. In considering the following recommendations, it must be emphasized that modifications to the research methods of an RCT that are intended to increase its assay sensitivity may have an impact on its generalizability.

View all citing articles on Scopus
View full text