Abstract
Objective. Although protocol registration for systematic reviews is still not mandatory, reviewers should be strongly encouraged to register the protocol to identify the methodological approach, including all outcomes of interest. This will minimize the likelihood of biased decisions in reviews, such as selective outcome reporting. A group of international experts convened to address issues regarding the need to develop hierarchical lists of outcome measurement instruments for a particular outcome for metaanalyses.
Methods. Multiple outcome measurement instruments exist to measure the same outcome. Metaanalysis of knee osteoarthritis (OA) trials, and the assessment of pain as an outcome, was used as an exemplar to assess how Outcome Measures in Rheumatology (OMERACT), the Cochrane Collaboration, and other international initiatives might contribute in this area. The meeting began with formal presentations of background topics, empirical evidence from the literature, and a brief introduction to 2 existing hierarchical lists of pain outcome measurement instruments recommended for metaanalyses of knee OA trials.
Results. After discussions, most participants agreed that there is a need to develop a methodology for generation of hierarchical lists of outcome measurement instruments to guide metaanalyses. Tools that could be used to steer development of such a prioritized list are the COSMIN checklist (COnsensus-based Standards for the selection of health status Measurement Instruments) and the OMERACT Filter 2.0.
Conclusion. We list meta-epidemiological research agenda items that address the frequency of reported outcomes in trials, as well as methodologies to assess the best measurement properties (i.e., truth, discrimination, and feasibility).
The Outcome Measures in Rheumatology (OMERACT) international consensus initiative has successfully developed core outcome sets for trials of many rheumatologic conditions, but its expanding scope called for clarification and updating of its underlying conceptual framework and working process1. The selection of appropriate outcomes, and subsequently instruments to measure these, is crucial when designing and interpreting clinical trials, to compare the effects of different interventions directly in ways that minimize bias.
Among patients with osteoarthritis (OA), the knee joint is particularly important given its importance for ambulation and thereby function of the individual. Loss of function is closely related to pain and may be more important to the patient than the actual radiographic signs of OA2,3. At the third OMERACT conference (1996), a core domain set was developed for future phase III trials of knee, hip, and hand OA, using a combination of discussion and polling procedures. There was consensus that the following 4 domains should be measured and reported in all clinical trials: pain, physical function, patient global assessment, and for studies of 1 year or longer, joint imaging4. However, no consensus was reached on which measurement instruments to apply. Consequently, in clinical trials (for example, in patients with knee problems), pain is being assessed using many different instruments5.
Systematic reviews of knee OA randomized trials are crucial for making evidence-based decisions regarding available therapies6,7,8. The preparation of a protocol for a systematic review starts with a clear formulation of the research question in the PICO search strategy framework9, i.e., a clinically relevant or policy-relevant question that takes into account Population, Intervention, Comparison, and Outcomes — including both the benefit and harm of the intervention being studied10. The choice of the outcome instrument of interest for any given construct, to include in metaanalyses, is generally based on clinical judgment. Because outcomes on a specific domain (e.g., pain) can each be measured using a range of different instruments, it is often necessary to standardize assessments to a common scale using the standardized mean difference3, before the outcomes across trials can be combined in a metaanalysis.
To prevent bias and ensure transparency in reporting, it is now considered mandatory (for leading journals) that clinical trials be registered, and key protocol elements be made publicly available prior to trial completion11. Despite this, outcomes reported in published articles of trial results are frequently incomplete and inconsistent with their protocol where one is available12, which can lead to bias13 — perhaps especially for serious adverse events14. Following this, systematic reviewers may thereby further introduce bias into their metaanalyses using such data, because any posthoc evaluation can affect the findings of metaanalyses, either in an optimistic or pessimistic direction. Therefore, the validity of systematic reviews is potentially vulnerable if a trial has considered several outcomes but reports only those giving significant results and if the systematic review protocol has not specified outcomes and the hierarchy of outcomes in advance of its conduct15. Although protocol registration for systematic reviews is still not mandatory16 except for Cochrane and Campbell systematic reviews10, reviewers should be strongly encouraged to prespecify and register all critical issues in their protocol17. It follows that a published or registered and publicly available protocol18 may help restrict the likelihood of biased posthoc decisions in review methods, such as selective outcome reporting19.
In addition to prespecification of (a hierarchy of) outcomes in clinical trials and systematic reviews, consensus is also needed on instruments that have been used to measure these outcomes. The OMERACT Filter 2.0 explicitly describes a comprehensive conceptual framework and a recommended process to select outcomes (domains) and develop core outcome measurement sets for rheumatology trials20. To build on this work, a group of international experts convened to address issues regarding the need to develop hierarchical lists of outcome measurement instruments for reporting these core outcomes in metaanalyses of clinical trials, and to assess how OMERACT might contribute.
MATERIALS AND METHODS
The OMERACT Executive group identified the need to develop a deliberate dialogue between groups interested in achieving consensus on outcomes and outcome measurement instruments and to provide optimal evidence for benefit and harms in systematic reviews of studies on chronic painful musculoskeletal conditions. This article describes results of a workshop evaluating the hierarchy of pain-related continuous outcome measurement instruments recommended for metaanalyses and systematic reviews of knee OA trials; knee OA was selected as a prototype that can be used to develop a template for hierarchies in other musculoskeletal conditions.
Topics for discussion underpinning the need for consensus on the need for hierarchical lists on outcome measurement instruments for pain in metaanalyses of knee OA trials were included. The formal introductory presentations focused on specific issues, including relevant empirical evidence related to the need for a hierarchical list of patient-reported pain outcome measurement instruments.
RESULTS
The workshop participants were clinical epidemiologists, clinicians, patients, biostatisticians, and the editor-in-chief of the Cochrane Collaboration. One of the first tasks of the workshop was to review the objectives to clarify any outstanding issues among the 9 participants. The issues identified for discussion included reporting bias and multiplicity, as well as existing hierarchical lists of pain-related outcome measurement instruments for metaanalyses.
Reporting Bias and Multiplicity in Metaanalyses
A synthesis of recent data on this topic was presented. The group agreed that study publication and outcome reporting biases are potential threats to the validity of metaanalysis. In addition, metaanalyses of pain studies may include trials measuring pain using more than 1 instrument, thereby introducing potential bias related to issues of multiplicity21. Tendal, et al showed that the possible effect of multiple data points in trial reports regarding measurement scales, time-points, or intervention groups on metaanalysis results varies greatly across metaanalyses21. Consequently, data in trial reports might lead to biased decisions about which data to include in metaanalyses. Individuals should be aware of these biases as possibly making the readily available evidence unreliable for decision making22.
After the presentations, participants emphasized that it would be valuable if a systematic review protocol specified major outcomes in terms of core domains (e.g., pain and physical function). A protocol should probably also address potential reporting bias and multiplicity issues by prespecifying a proposed hierarchy of potentially available measurement instruments. Explicit consideration of the timepoint(s) for measurement would also be preferable. One issue raised was whether it was “a given” that metaanalyses are always valuable. Most participants argued that the issues of clinical heterogeneity (e.g., population, interventions, and comparators) are important. Therefore metaanalysis is not always feasible and should always be considered in the context of the contributing studies23. However, because good systematic reviews should build on the premise that selected trials fulfill a specific PICO criterion (asking the same research question)10, it should be possible to combine them, provided there is some degree of clinical homogeneity. The heterogeneity caused by different instruments measuring the same outcome is, on the other hand, not an acceptable reason for excluding evidence synthesis.
Another comment was that any proposed measurement instrument hierarchy needs to be supported by the instruments’ documented psychometric properties24, rather than developed by simply using consensus based on available data from the literature. All participants supported this notion and indicated that, before developing a hierarchy, the psychometric properties of the proposed individual instruments have to be taken into consideration. That led the group to the conceptual question of whether the “simple visual analog scale (VAS)” is valid25, and/or whether it needs/deserves a degree of psychometric reconsideration.
Existing Hierarchical Lists of Pain-related Outcome Measurement Instruments for Metaanalyses
Jüni, et al26 presented a hierarchical list of pain-related continuous outcome measurement instruments recommended for OA trials. It was suggested that the reported outcome of the instrument highest on this list should be extracted to minimize outcome reporting bias (Table 1)26. The list was generated based on input from a small group of OA experts, without any input from patients and without consideration of available empirical data. Currently, this proposed hierarchy is part of the preliminary default template suggested — but not yet endorsed — by the Cochrane Musculoskeletal Group (CMSG) editorial team regarding the selection and inclusion of pain measurement instruments in summary of findings tables for OA interventions.
Jüni, et al’s hierarchical list26 has since been followed by another, as recommended by Juhl, et al27, who concluded that choosing the instrument with the most favorable results for the intervention from individual trials could lead to biased estimates in metaanalyses. As a result, Juhl, et al recommended a pragmatic hierarchy based on a prioritized list (Table 1). The creation of this proposed hierarchy involved some degree of empirical evidence on the responsiveness of pain instruments derived from a systematic search conducted in the 20 highest impact factor general and rheumatology journals27.
After the presentation, most participants agreed that a hierarchy would be desirable, making it “mandatory” for systematic reviewers to follow a prespecified list, rather than having potential selection bias at the metaanalysis stage. Among the participants there was agreement that neither of the existing papers provided compelling data to support any of the proposed hierarchies. A critical element of the discussions identified was that empirical evidence for the frequent use of any particular outcome instrument might not necessarily imply good psychometric properties24. Thus we were not able to achieve consensus on the “best approach” for measuring pain outcomes for metaanalyses of knee OA trials. The group was, however, supportive of using both lists (Table 1) as inspiration and a starting point for further research. Two points of view were relevant from these discussions: (1) preference was given to the continuing use of the Jüni, et al hierarchy26 as a consequence of the present CMSG support10, whereas (2) the recommended hierarchy from Juhl, et al has the advantage of encouraging use of the validated Western Ontario and McMaster Universities Arthritis (WOMAC) Index28 — and thus indirectly also the Knee injury and Osteoarthritis Outcome Score29,30 — over any VAS assessments with unclear psychometric properties25.
Voting Results (Plenary)
After the individual workshops, a plenary session was held and all meeting participants were asked to vote on 2 questions related to a hierarchical list of pain instruments applicable for metaanalyses:
Do we need a hierarchical list of instruments for a particular outcome within a particular domain for systematic reviews to prevent selective data extraction?
Is determining the methodology to develop a hierarchical list of instruments an important research question?
Out of 39 responders, 25 (64%) agreed that there is a need for a hierarchical list of instruments to prevent selective data extraction in metaanalyses and systematic reviews. From the voting it was clear that this topic probably needed further discussion as the number of participants who voted “Don’t know” was similar to the proportion voting “No” [7 (18%) in each group]. However, the attendees were less hesitant when asked whether the methodology to develop such hierarchical lists was an important research question: 90% (35/39) voted “Yes” and the remaining 10% voted “No.”
By the end of the OMERACT premeeting, there was consensus among the participants of the need to develop a hierarchical list of outcome instruments for outcomes within the pain domain for metaanalyses and systematic reviews. Because choosing the most beneficial result from available outcome measurement instruments from individual trials can over/underestimate the effect compared with a systematic approach21,22,31, developing a protocol for the metaanalysis or systematic review and using a prioritized list will probably aid reviewers in prespecifying endpoints.
Other Aspects That Need to Be Addressed
The need for a hierarchical list of outcome instruments (to avoid bias) apparently also applies to other critical features of a systematic review, such as: (1) “How should the decision on optimal trial duration be determined?”; (2) “Is there a minimum sample size requirement?”; and (3) “Is there a minimum baseline pain intensity/frequency, etc.?”
Judging from the intensive discussions, the need for hierarchies of instruments for available core outcome measurement sets is likely to be relevant for any outcome in any health condition for which systematic reviewers might do evidence synthesis — not just pain21. This underlines the potential use of the existing OMERACT core domain set for guidance on possible endpoints4. It was also clear from the discussions and the voting results, that determining the methodology to develop hierarchical lists of instruments is an important research question.
Research Agenda
Following these valuable discussions, we suggest the following for working groups who want to propose hierarchical lists for outcome measurement instruments in metaanalyses:
Combine meta-epidemiology32, i.e., evaluation of the frequency of use of different instruments in the published trial literature33;
Examine the performance of systematic reviews of studies on different instruments’ measurement properties24; and
Consider the suggestion that, among outcomes frequently reported in trials based upon meta-epidemiology studies, those with the best measurement properties as assessed by frameworks such as “OMERACT Filter 2.0”20 or the COSMIN (COnsensus-based Standards for the selection of health status Measurement Instruments) checklist34, would be ranked high on a hierarchical list for selection of outcome measurement instruments in metaanalyses.
Footnotes
RC is supported by unrestricted grants from The Oak Foundation; RB is supported in part by an Australian National Health and Research Council Practitioner Fellowship; PRW is supported by grants for the COMET Initiative from the Medical Research Council and the European Commission; JAS is supported by grants from the US Agency for Health Quality and Research Center for Education and Research on Therapeutics U19 HS021110, US National Institute of Arthritis and Musculoskeletal and Skin Diseases P50 AR060772 and U34 AR062891, US National Institute on Aging U01 AG018947, US National Cancer Institute U10 CA149950, the resources and the use of facilities at the VA Medical Center at Birmingham, Alabama, USA, and research contract CE-1304-6631 from the Patient-Centered Outcomes Research Institute.