Original Article
A first step to assess harm and benefit in clinical trials in one scale

https://doi.org/10.1016/j.jclinepi.2009.07.002Get rights and content

Abstract

Objective

To develop a simple system to assess benefit and harm of treatment on a single scale. Harm and benefit signals from trials need to be placed in the proper perspective to decide on the value of a treatment. Several systems have been developed for assessment, but few attempt to incorporate both benefit and risk in the same metric while retaining enough simplicity to aid patients and clinicians in their decision making.

Study Design and Setting

We designed a very simple 3 × 3 table (Outcome Measures in Rheumatology [OMERACT] 3 × 3) that comprises three ranks for both beneficial and harm outcomes: for benefit, these are “none,” “substantial,” and “(near) remission”; for harm, these are “none,” “severe,” and “(near) death.” Patients are ranked both for benefit and harm and subsequently counted in a 3 × 3 table.

Results

The system was feasible when applied to one trial dataset (patient-level information) and a meta-analysis. To become applicable as a tool, several issues need to be resolved in further development, especially the definitions and cutoffs for the ranks.

Conclusion

A simple 3 × 3 table to rank both benefit and harm outcomes is feasible. For rheumatology this will be further developed in the context of the OMERACT initiative.

Introduction

Data on harm (here used interchangeably with “risk”) and benefit of a medical (often drug) treatment, whether originating from trials or observational studies need to be placed in the proper perspective to decide on the utility of a treatment. In many settings, standardized measures are available to assess benefit, but in comparison, the assessment of harm is still fairly primitive. Moreover, benefit and harm have not yet been usefully combined into one scale.

In 1998, the Council for International Organizations of Medical Sciences (CIOMS) stated, “It is a frustrating aspect of benefit-risk evaluation that there is no defined and tested algorithm or summary metric that combines benefit and risk data and that might permit straightforward quantitative comparisons of different treatment options, which in turn might aid in decision making” [1]. This statement was repeated by a committee of the Committee for Medicinal Products for Human Use of the European Medicines Agency in 2007 [2] and expanded by Mussen et al. [3]. One of the problems is that benefit and risk assessment is mostly driven by the need to make decisions, whereas most research is “truth driven.” More specifically, benefit–risk assessment involves placing value judgments on scientific facts. These values will vary depending on the perspective of the assessor. In addition, comparing benefit and risk resembles comparing apples and pears, which involves a trading off (and discounting) of long-term against short-term effects. Finally, in real life multiple benefits and risks need to be assessed simultaneously.

A brief literature review of what is usually called “benefit–risk assessment” (Pubmed and Google scholar; terms: benefit, risk, harm, drug therapy, assessment, and review in various combinations) revealed a relatively sparse body of literature, some of it less accessible (book chapters, etc.). Broadly speaking, two types of methodology have been proposed: simple qualitative ranking systems and more complicated methods. Honig distinguished a third category, that of models for specific classes of drugs [4]. Within the context of a special meeting held at the ninth Outcome Measures in Rheumatology Conference (OMERACT, see http://www.omeract.org) [5], we have identified the need for simple tools that incorporate both benefit and risk into one scale, to aid patients and clinicians in their decision making. The following were considered to be sufficiently simple that the patient can understand them and/or clinician can explain them; likewise, decision makers without technical expertise can understand them.

  • 1.

    Number needed to treat vs. number needed to harm [6], [7]

  • 2.

    Unqualified success and unmitigated failure [8]

  • 3.

    Principle of Three and Transparent Uniform Risk/Benefit Overview (TURBO) [1]

  • 4.

    Grading of Recommendations Assessment, Development and Evaluation (GRADE) [9]

Within a single trial, the number needed to treat/number needed to harm metrics are frequently used and also applied in meta-analyses to give the best synthesized estimates [10]. In such reviews, separate numbers are shown for the main outcomes of benefit and harm. Mancini and Schulzer developed an extension to incorporate both numbers in one metric [8]: these authors introduced the concept of unqualified success (benefit success without any harmful event) and unmitigated failure (a harmful event without any benefit). Calculation of these entities is straightforward if the occurrence of benefit and harm is deemed independent and slightly more complicated in case of a known dependency. However, both the number needed to treat/harm and its extension are limited to one benefit and one risk (unless these are pooled), expressed as binary endpoints.

The “Principle of TURBO” models were described in the 1998 CIOMS report [1]. These can be used to summarize all the available evidence (trials and observational studies) on a qualitative scale. Although the “Principle of Three” originally pertained to selecting the three most serious and the three most frequent side effects for analysis, it also applies to the method as a whole, comprising three separate 3 × 3 tables. The first describes the disease, the second the benefit(s), and the third the harm(s) of the treatment. In each, the dimensions “seriousness,” “duration,” and “incidence” are scored on a four-point scale: 0 = absent or no effect; 1 = low; 2 = medium; and 3 = high. Treatment “profiles” as expressed in the tables can now be compared.

The “TURBO” model is an expansion of the Principles of Three. In the first step, the most important harms receive a semiquantitative score on a scale of 1 to 7 based on frequency and severity. Likewise, the most important benefits receive a score based on likelihood and degree of effect. Finally, both scores are plotted on a two-dimensional diagram with seven points on each axis.

The “Grading of Recommendations Assessment, Development and Evaluation” (GRADE) system takes a different approach by emphasizing quality of evidence [9]. Explicit judgments are requested for each important setting. Trade-offs between benefit and harm are categorized as:

  • Net benefits: the intervention clearly does more good than harm.

  • Trade-offs: there are important trade-offs between benefit and harm.

  • Uncertain trade-offs: it is not clear whether the intervention does more good than harm.

  • No net benefits: the intervention clearly does not do more good than harm.

GRADE thus provides nomenclature for a decision based on implicit or explicit weighting of the evidence for the trade-offs. The implementation of GRADE is still evolving.

In our search, we also identified more complex methods, judged to be substantive but too complex for easy explanation to patients and nontechnical stakeholders. These included minimal clinical efficacy [11], various models based on quality of life measures, for example, Q-TWIST [12], multi-criteria decision analysis [13], and incremental risk–benefit ratio and risk–benefit acceptability curve (Bayesian probabilistic modeling) [14]. These methods fall outside the scope of this study.

From this review, and inspired by the patient decision aids developed in Ottawa [15], [16], we propose a very simple ranking system (OMERACT 3 × 3) that builds on the Principle of Three [1] to be piloted within the context of a trial.

Section snippets

OMERACT 3 × 3

We propose to initially devise a very simple system that can categorize the outcome of a patient according to three categories of benefit and three categories of harm, creating a 3 × 3 table (Table 1). The first two categories are “none/minimal” and “major” for benefit and harm. The third category is “(near) remission” for benefit and “(near) death” for harm. If necessary, more detail can be assessed later or on a separate “page.” Note that we use the term “harm” because adverse events occur

Discussion

Several issues need further discussion.

Conclusion

A clear need exists for validated and applicable tools to assess benefit and risk simultaneously. Patients are increasingly demanding that clinicians provide them with the evidence for harms and benefits; one important avenue for this is to develop the methodology to assess benefit and risk on a single scale. More active dialogue and engagement are needed between the key methodologic disciplines (clinical epidemiologists, clinical pharmacologists, pharmacoepidemiologists, and psychometricians),

References (22)

  • A. Laupacis et al.

    An assessment of clinically useful measures of the consequences of treatment

    N Engl J Med

    (1988)
  • Cited by (0)

    View full text