Abstract
Objective. Pain is a patient-important outcome, but current reporting in randomized controlled trials and systematic reviews is often suboptimal, impeding clinical interpretation and decision making.
Methods. A working group at the 2014 Outcome Measures in Rheumatology (OMERACT 12) was convened to provide guidance for reporting treatment effects regarding pain for individual studies and systematic reviews.
Results For individual trials, authors should report, in addition to mean change, the proportion of patients achieving 1 or more thresholds of improvement from baseline pain (e.g., ≥ 20%, ≥ 30%, ≥ 50%), achievement of a desirable pain state (e.g., no worse than mild pain), and/or a combination of change and state. Effects on pain should be accompanied by other patient-important outcomes to facilitate interpretation. When pooling data for metaanalysis, authors should consider converting all continuous measures for pain to a 100 mm visual analog scale (VAS) for pain and use the established, minimally important difference (MID) of 10 mm, and the conventionally used, appreciably important differences of 20 mm, 30 mm, and 50 mm, to facilitate interpretation. Effects ≤ 0.5 units suggest a small or very small effect. To further increase interpretability, the pooled estimate on the VAS should also be transformed to a binary outcome and expressed as a relative risk and risk difference. This transformation can be achieved by calculating the probability of experiencing a treatment effect greater than the MID and the thresholds for appreciably important differences in pain reduction in the control and intervention groups.
Conclusion. Presentation of relative effects regarding pain will facilitate interpretation of treatment effects.
Randomized controlled trials (RCT) and systematic reviews can provide important direction for clinical decision making, but their usefulness may be compromised by failing to report results that provide interpretable estimates of the magnitude of effect. Pain is a common outcome reported among clinical trials. There are, however, many ways to measure this domain — including use of a number of instruments that are unfamiliar to many clinicians and patients. Pain is typically reported as a continuous measure, which further complicates interpretation of treatment effect results. Here, we offer suggestions regarding how best to report treatment effects on pain in both individual studies and systematic reviews (Table 1). Our suggestions are informed by a workshop convened at the 2014 Outcome Measures in Rheumatology (OMERACT) 12 conference.
Summary of recommendations.
What Effect Measures Do Clinicians Find Most Useful?
Clinicians generally find dichotomous presentation of continuous outcomes more useful1. A number of studies have documented clinicians’ reactions to presentations of binary outcomes as a relative risk, absolute risk, and number needed to treat (NNT). Physicians presented with the relative change in outcome rate are likely to perceive a therapy more effective than if the same data are presented with the absolute change (risk difference) or NNT2,3,4. The NNT has been advanced by some as the most helpful measure of association5,6; however, some patient and physician surveys have found that lay people7 and medical doctors8 have difficulty grasping the concept of NNT. Some evidence suggests that presenting binary outcomes as natural frequencies (a reduction of adverse events is presented as 3 in 100 rather than 3% or the associated NNT, 33) may be the best way to achieve understanding in a variety of audiences9,10,11. But other studies suggest that when event rates are sufficiently high (> 1% chance of occurring), the percent change may be more easily grasped than natural units12. Our subsequent recommendations are informed by these results.
Clinical Trials
Pain should be reported directly by patients
Pain is a common complaint among patients seeking care. It is a patient-important outcome when it is reported directly by patients, without interpretation by physicians or other proxies. A review of RCT that explored the effect of opioids for chronic non-cancer pain (n = 161) found that while almost all trials (98.8%) reported pain as an outcome measure, 1 trial reported pain data only as observed by clinicians, 6 reported pain data from both patients and clinicians, and the source of pain data was not clear in 2613. Although there are rare exceptions involving patients with limited ability to communicate, pain measures should be acquired directly from patients, and trialists should make this explicit when reporting pain data.
Capturing a global assessment of pain is preferable to multiple pain items
Pain may have many different features (e.g., burning, stabbing, aching) and may be associated with both a specific condition under study (e.g., osteoarthritis of the hip) and comorbidity. Trialists are faced with a choice of whether to try to collect all facets of pain among enrolled patients, or to capture a global assessment of pain (e.g., is your overall pain a lot worse, a little worse, the same, a little better, a lot better?). To the extent that patients will be most interested in how interventions will reduce the overall effect of pain, and that exploration of pain characteristics may place a burden on patients that provides little insight into the effect of treatment — both of which are likely to be the case — global assessments will be preferable in most circumstances.
Trialists should facilitate interpretability of pain outcome data
There are outcome domains that patients with painful complaints are interested in, beyond pain relief, particularly for chronic pain. The Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) has recommended that trialists exploring strategies for managing pain-related complaints consider 8 outcome domains, in addition to pain relief14,15,16,17: (1) physical functioning; (2) emotional functioning; (3) participant rating of improvement and satisfaction with treatment; (4) adverse events; (5) participant disposition (for example, adherence to the treatment regime and reasons for premature withdrawal from the trial); (6) role functioning; (7) interpersonal functioning; and (8) sleep and fatigue. From a practical perspective, trialists will have to balance the competing demands of exhaustive outcome data collection with study feasibility.
The effect on pain should be accompanied by presentation of treatment effects on other patient-important outcomes, such as adverse events, function, and sleep, because similar effects cannot be assumed. For example, among trials of opioids versus placebo for chronic non-cancer pain reviewed by Furlan, et al, the effect size was twice as large for pain relief [standardized mean difference (SMD) = −0.60; 95% confidence interval (CI) = −0.69 to −0.50) vs improvement in function (0.31; 95% CI = 0.41 to 0.22)]18.
Pain is typically recorded as a continuous outcome measure, and trialists can present the effect of a given intervention on pain in multiple ways. Some may simply indicate whether the effect on pain was statistically significant. It is commonly assumed that a p value ≤ 0.05 is indicative of an important finding; however, the p value does not take into account the size of the observed effect. The clinical implications of a particular study depend on the magnitude of effect and the associated measure of precision (typically 95% CI) and these estimates can have large or small p values, depending on the sample size and number of events.
Many trials report the effect on pain as a mean change with an associated measure of precision, such as an improvement of 10 mm on a 100-mm visual analog scale (VAS) for pain. Such an effect may be statistically significant, but is it important to patients? Even when the mean value represents an effect that is important to patients, many clinicians will extrapolate this effect to all patients19; however, treatment response will differ among patients: some will experience benefit greater than the mean difference; some less. Rather than focusing exclusively on the mean difference, examining the difference in the proportion of patients who report an important reduction in their pain, or who have achieved a threshold of acceptable pain, provides complementary information. The differences in these proportions yields a risk difference that one can convert to an NNT — the number of patients that need to receive treatment to achieve an important benefit in 1 patient.
The minimal important difference
One way to dichotomize continuous data is to use the smallest change in an instrument score that patients perceive is important — the minimal important difference (MID). The term minimum clinically important difference (MCID) is also used; however, this terminology focuses on clinicians’ perceptions versus those of patients20. Establishing the MID requires comparison with an independent standard or anchor that is itself interpretable, and to which the instrument measuring pain is at least moderately correlated. An anchor should focus on measures of improvement informed by a patient’s own experiences (e.g., an appreciable improvement in symptoms, return to function, or global response to treatment).
Although it is tempting to conclude that mean differences less than the MID are not worthwhile, and mean differences exceeding the MID suggest that most or all patients will benefit from treatment, this conclusion is misguided21. Consider an example where the MID is 0.50 and patients’ mean improvement versus control is 0.25. This could mean that 75% had no improvement and 25% experienced a mean change of 1.0, which would result in an NNT of 4, a clearly important benefit.
What to do if the MID is not known
An anchor-based MID has not been established for many continuous outcome measures used to assess pain; however, investigators can still provide estimates of the proportion benefiting and the corresponding NNT. One option is to assume that one-half the baseline SD of the instrument score represents the MID22. However, although this represents a moderate effect size, there is evidence that anchor-based and distribution-based MID may differ23,24; further, an anchor-based MID directly captures the patients’ values and preferences25,26. A more satisfactory approach is to convert pain measures to a single instrument for which an anchor-based MID has been established (see below).
Choosing a threshold that is important to patients
The MID may seem like an obvious choice to establish a threshold for meaningful change in pain when measured as a continuous outcome; however, both clinicians and patients may be interested in the ability of a given intervention to provide more than a minimally important difference27 — to produce improvement that allows patients to feel appreciably, not just minimally, better28. Minimal improvements in pain may not be associated with discernable improvements in function, and some evidence suggests that for patients with chronic non-cancer pain, treatment effects on function are only half as large as treatment effects for pain18. For this reason, a number of authoritative groups, including many Cochrane groups who are focused on pain, have suggested that trialists and review authors consider not the MID for pain (≥ 10% reduction from baseline29,30), but ≥ 20%, ≥ 30%, or ≥ 50% reduction from baseline as improvement that is likely to be appreciably important to patients31.
In the absence of consensus on what constitutes a patient-important threshold in pain relief, it is reasonable to provide a range of options. To provide guidance in this regard, participants of the 2014 OMERACT Workshop advocated for reporting either an appreciable reduction from baseline pain (e.g., 20%, 30%, or 50%), achievement of a desirable pain state (e.g., no worse than mild pain32, a patient acceptable state28), or a combination of change and state: “My pain has improved and I feel good”. Choosing different thresholds for treatment effect may influence the level of statistical significance, and trialists should therefore choose and justify their threshold in advance of their analyses. However, at least in some circumstances33, and perhaps in most34, the choice of threshold does not affect the magnitude of relative effect.
Duration of followup
The duration of followup for measuring pain relief is another source of variability among trials. It stands to reason that studies of acute pain should consider shorter time frames and trials of chronic pain should implement longer followup assessments. This should be done in part to inform tolerability of therapy in the longer term, and because outcomes such as quality of life and functional gains require sufficient time to manifest among treatment responders. However, RCT of opioids for chronic non-cancer pain have not measured outcomes for longer than 16 weeks13, and many chronic pain trials have reported effect estimates at timepoints that most patients would consider unimportant. For example, we are aware of 5 RCT that explored management of chronic post-stroke pain that captured outcome data at ≤ 1 hour after treatment35,36,37,38,39. As a general rule, we would suggest that trials enrolling chronic pain patients should capture outcomes at least up to 6 months, and ideally up to 1 year, after treatment. Further, systematic reviews of chronic pain RCT should exclude trials that have followed patients for less than 2 weeks after treatment.
Systematic Reviews
Trials using the same outcome measure
When a common outcome measure for pain is reported among trials, reviewers can preserve the natural units of measure when pooling across trials by calculating the weighted mean difference. Unless the instrument is very familiar, the effect may not be easy to interpret without copresentation of meaningful thresholds such as the MID and appreciable important differences, and even with such context readers may mistakenly attribute the mean effect on pain to all patients19.
Trials using different outcome measures
This is the more common situation that systematic review authors will face40. A recent review of RCT that explored management of fibromyalgia (FM) found that eligible studies that identified pain (n = 241) reported 75 different measures of this outcome (Appendix 1)41.
There are 5 strategies available to pool different measures that address a common outcome domain:
Convert to SMD: As a first approach, different measures of pain can be pooled by converting to SMD, which is the approach recommended by the Cochrane Collaboration42; however, this measure of effect is difficult to interpret1,43 and is affected by differences in baseline heterogeneity among study populations. Greater heterogeneity among pain scores at baseline will result in a smaller SMD versus studies that enroll patients that provide more homogeneous scores, even when the true underlying effect on pain is the same (Figure 1).
Convert to a single instrument: A second approach is to convert different instruments that measure pain into a single, most familiar instrument and the associated estimate of precision44. For example, our review of FM found that, among 75 different instruments for reporting pain used, the 10 cm/100 mm VAS was the most commonly reported41. There are 2 statistical approaches to convert multiple instruments to a common measure: (A) Multiply SD units × SD of the most familiar instrument. Limitations of this approach include challenges in deciding which SD to use; and the process remains vulnerable to differences in variability of patients’ pain scores across studies (Figure 1); or (B) Rescale to units of the most familiar instrument. Both these approaches remain vulnerable to challenges with interpretation and misinterpretation because the mean effect may be mistakenly applied to all patients. For pain, the most familiar instrument is the widely used 10 cm/100 mm VAS. For this instrument the MID has been established as about 1.0 cm/10 mm29, regardless of pain severity30. Although providing readers with the MID facilitates interpretation, review authors should caution readers against dismissing effects less than 1 MID unit, and should provide guidance for interpreting magnitude of effect. If the MID is 1.0 cm/10 mm and the mean difference between treatments is 0.9, clinicians may infer that nobody benefits from the intervention; if the mean difference is 1.1, they may conclude that everyone benefits. Both conclusions are problematic as they ignore the distribution of benefit between individuals. We suggest the following guide for interpretation given a 1.0 MID: if the pooled estimate is ≥ 2.0, and one accepts that the estimate of effect is accurate, this suggests a large effect. If the pooled estimate is between 1.0 and 1.9, many patients may gain important benefits from treatment. If the estimate of effect lies between 0.5 and 1.0, the treatment may benefit an appreciable number of patients. Effects ≤ 0.5 units suggest a small or very small effect.
Calculate a ratio of means: A third approach is to calculate a ratio of means, which has the advantage of facilitating pooling continuous outcomes expressed in different units without relying on SD units45. This effect estimate is also reasonably straightforward in its interpretation. A ratio of means of 0.75 conveys a relative risk reduction in pain of 25% between those treated and those in the control group. This effect estimate requires a natural 0, which means this method cannot be used when the control group changes for the worse, and the treatment group for the better.
Present results in MID units: A fourth approach is, rather than present results in SD units, to present them in MID units. This allows a more direct inference than presenting in natural units and informing readers about the MID. As above, an effect of < 0.5 MID units suggests small or very small effect.
Apply statistical methods to estimate odds: A fifth approach is to use statistical methods to provide an estimate of the odds, or probability, of achieving a desirable outcome in the intervention versus the control group. There are 2 fundamental statistical approaches to making this calculation: (A) Convert the SMD into a proportion that confers benefit. Limitations of this approach include the underlying vulnerability of the SMD to population heterogeneity, challenges with interpreting to what the proportion refers (e.g., large or moderate reduction in pain vs minor or no reduction in pain; any reduction in pain vs no reduction in pain), and the requirement for an approximate normal distribution of data and equal variance in intervention and control groups46. A final limitation is that the methods demand specification of the success (or failure) rate in the control proportion, and this may not be clear. This is a serious limitation only if the control success or failure rate is likely to be extreme, because the effect estimates differ appreciably only at extremes (< 0.2 or > 0.8; Table 2); or (B) Create a binary outcome, and thus an OR or risk difference (an approach we advocate), avoiding the challenges associated with reliance on SD units. This method uses mean differences and the associated variances in each study to estimate the proportion of patients who achieved an improvement of the MID or greater in that study44. To provide insight regarding the proportion of patients who achieve appreciable versus minimally important pain relief, review authors should also present pooled relative and absolute effect estimates using thresholds of 2 cm/20 mm, 3 cm/30 mm, and 5 cm/50 mm31.
Effect of patient heterogeneity on the standardized mean difference. MID: minimally important difference.
The relation between effect size or standardized mean difference and the number needed to treat (NNT) under normality and equal variance assumptions.
Choosing a strategy to present treatment effect on pain
Systematic review authors can opt to present the effect of therapy on pain in multiple ways, or select a single measure of effect. Consider a metaanalysis of prophylactic dexamethasone for laparoscopic cholecystectomy that explored the effect on postoperative pain47. The effect was informed by 5 RCT that enrolled 539 participants, and certainty in effect estimates was considered “low” according to the Grading of Recommendations Assessment, Development and Evaluation (GRADE) criteria because of inconsistency of results across studies, and imprecision associated with pooled estimates of effect. Table 3 presents the effect of dexamethasone on postoperative pain using alternative strategies discussed above, which results in a wide range of effect sizes; from large (SMD of 0.8) to small (0.4 MID units). One reason for this is the likely enrollment of homogeneous patients, resulting in an artificially large SMD (Figure 1).
Alternate strategies for presenting the effect on pain.
This example suggests that the presentation of effect estimates for pain reduction using multiple formats has the potential to confuse readers. Accordingly, we believe that if there is strong evidence to inform the anchor-based MID (appreciable and/or substantial thresholds for improvement with a given pain measurement instrument), that systematic review authors should restrict their presentation of effect estimates to approaches relying on these thresholds. Ideally, review authors will convert all continuous measures for pain to a 10 cm/100 mm VAS for pain and inform readers of the established MID of 1 cm/10 mm, and/or the conventionally used appreciably important differences of 2 cm/20 mm, 3 cm/30 mm, and 5 cm/50 mm (See Approach 2 above). Authors should also use these thresholds to convert the continuous variable to a binary outcome and present the pooled relative and absolute effects (Approach 5 above).
Reporting pain in a GRADE summary of findings table
The GRADE system is an explicit approach to evaluate the certainty of treatment effect estimates48. Part of the GRADE process involves presenting the results of systematic reviews in a summary of findings (SoF) table — a succinct presentation of evidence quality and magnitude of effects49. GRADE has been adopted by over 70 organizations worldwide, including the World Health Organization, the Cochrane Collaboration, and the American College of Physicians, and now provides detailed guidance on application of GRADE criteria for preparing SoF tables for continuous outcomes. Table 3 demonstrates the presentation of results in SoF tables.
A rule-of-thumb for generating an SoF table is to restrict the number of outcomes presented to a maximum of 7 per table. Attendees of the 2014 OMERACT conference voted (30 to 8) that, for conditions in which pain is the defining feature, 2 SoF rows should be considered for pain-related outcomes. IMMPACT has recommended that 9 outcome measures, including pain, should be reported when assessing treatment effects for clinical conditions defined by pain (Table 4)14,15,16,17. This suggests that systematic review authors using the GRADE approach will have to use their judgment to provide no more than 7 outcomes in an SoF table that they believe are of greatest importance to patients.
Research Agenda
Many of the approaches available to convert pain to a binary outcome rely on the continuous data being normally distributed. Future research should explore the distribution of pain outcomes among different clinical conditions to confirm or refute this assumption. We have proposed a number of thresholds to dichotomize continuous pain data, which reflects the considerable debate in this area, and future research should explore the validity of these thresholds to promote further standardization and consensus. This process should involve input from patients and field clinicians. Other areas for exploration include standardizing the timing of pain data collection (e.g., pain at present, pain in the last 24 h, pain in the last week), and further establishing the relationship between pain reduction and improvement of other patient-important outcomes, such as function and sleep.
APPENDIX 1.
Outcome measures for reporting pain among trials of therapy for fibromyalgia (n = 241)
Footnotes
JAS is supported by grants from the US Agency for Health Quality and Research Center for Education and Research on Therapeutics U19 HS021110, National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS) P50 AR060772 and U34 AR062891, National Institute of Aging U01 AG018947, National Cancer Institute U10 CA149950, the resources and the use of facilities at the VA Medical Center at Birmingham, Alabama, USA, and research contract CE-1304-6631 from the Patient-Centered Outcomes Research Institute. MSA is the recipient of a K24 award from NIAMS (K24AR053593)