Abstract
Objective. Developing international consensus on outcome measures for clinical trials is challenging. The following paper will review consensus building in Outcome Measures in Rheumatology (OMERACT), with a focus on the Delphi.
Methods. Based on the literature and feedback from delegates at OMERACT 2018, a set of recommendations is provided in the form of the OMERACT Delphi Consensus Checklist.
Results. The OMERACT delegates generally supported the use of the checklist as a guide. The checklist provides guidance for clearly outlining the multiple aspects of the Delphi process.
Conclusion. OMERACT is deeply committed to consensus building and these recommendations should be considered a work in progress.
Outcome Measures in Rheumatology (OMERACT) is an international collaboration devoted to developing consensus on outcome measures for trials involving rheumatic diseases1. Consensus building is a crucial component of the process but may be challenging. As a result, the OMERACT 2018 meeting put “consensus” front and center to bring attention to our current practices and provide an opportunity to reflect and improve on them. This was done through 2 plenaries, formal voting, informal discussions, and training sessions for nominal group technique facilitators. The following paper will review consensus building in OMERACT and reflect on feedback received during the formal sessions, as well as informal discussions throughout the meeting. We also outline plans for future activities to improve the process. To limit the scope of this paper, we will focus primarily on the Delphi, a formal consensus method used in many OMERACT initiatives.
Consensus is at the core of OMERACT
Building consensus on core outcome sets (COS) for clinical trials in rheumatic diseases has numerous benefits such as reducing biased reporting, a more comprehensive assessment of efficacy, and better opportunities for comparison and metaanalysis. The key principles of consensus building in OMERACT can be summarized as follows: consensus is not simply “majority wins”; consensus must be evidence-based; all relevant groups must be represented; a face-to-face interaction must occur at some point in the process; and there is a formal iterative process to work toward consensus2.
A majority vote does not guarantee consensus. It simply reflects that a majority is in favor, but there could be groups who consider the outcome unacceptable. For OMERACT, consensus indicates that although the result of the process may not be everyone’s preferred choice, the aim is to reach an agreement that all participants can accept as a “working arrangement.” It is also noteworthy that OMERACT does not seek to force consensus because this ultimately leads to poor acceptance in the long term. With OMERACT, the evidence must be considered sufficient to support decisions, and when not sufficient, questions are added to the research agenda2.
In OMERACT, the process has evolved over time to include a wide variety of participants from multiple continents with strong representation from patients, carers, clinicians, researchers, regulators, payers, and industry. OMERACT firmly believes that these groups provide a wider range of knowledge and experience and that the interaction between participants stimulates consideration of a broader range of options. Although a recognized strength of the process, this can also present a challenge. For example, during the OMERACT 2016 meeting, some patients perceived themselves to be underrepresented in numbers (10%) in the final voting on core outcome domains. If patients play a major role in the phase of generating and prioritizing core outcome domains, they should be adequately represented in the final stages of decision making. Finally, the OMERACT meetings occur biennially and are an integral part of the process, bringing participants face to face for several days so that healthy discussion and debate can occur.
A formal consensus method: The Delphi
Among different strategies used to work toward consensus, the Delphi is frequently chosen3. The Delphi is one part of the entire consensus process and is defined as a systematic means to measure and facilitate consensus4. It is used when empiric evidence is limited or contradictory and is based on the premise that accurate and reliable decisions can best be achieved by consulting a panel of experts and accepting group consensus5. At OMERACT, the Delphi method is used to prioritize critically important domains from an initial list of candidate outcomes that should be included in a COS6,7. The OMERACT Rheumatoid Arthritis Flare Group, as an example, described in detail how the Delphi process was used to gain consensus among several hundred international patients, clinicians, and others on a COS for measuring rheumatoid arthritis flares8. The Delphi can also be used to obtain consensus on a list of candidate instruments that should subsequently be studied for their psychometric properties.
The Delphi method involves sending out surveys over several rounds. Participants, who are anonymous, rate potential items, or in the OMERACT context, candidate domains for a COS. In the first round they may also generate new items/domains. Then, in the next round, participants receive feedback comparing their own scores to the distribution of scores from other groups. Each participant is provided with an opportunity to re-rate domains. Although not part of the formal Delphi process, the final “round” may involve ranking items to ensure arriving at a reasonable number of core outcome domains is obtained.
The Delphi has several advantages, including the ability to reach many geographically dispersed participants, and it provides anonymity, thus reducing the potential for dominant individuals to sway the group. Disadvantages include the inability to discuss areas where there is lack of agreement directly with other participants, and it can be labor-intensive to collate scores and distribute feedback between rounds. Delphi software may facilitate the collation of scores and OMERACT currently mandates a face-to-face meeting to ensure that discussion can occur.
Despite extensive use of the Delphi in many contexts, several concerns have been raised in the literature. Studies using the Delphi for selecting performance indicators for healthcare, for medical and nursing education, or for determining outcomes to measure in clinical trials, often fail to adequately report sufficient methodological detail. Examples include poor reporting of background information provided to participants, response rates for all rounds, level of anonymity, formal feedback between rounds, and the definition of consensus6,9,10,11,12,13.
MATERIALS AND METHODS
To improve the use and reporting of the Delphi within OMERACT, a preliminary OMERACT Delphi checklist (Figure 1) was developed based on previous recommendations and expert input5,6,9,10,11,12,14. The experts included all the authors (n = 10) of this article, which consisted of a patient, rheumatologists, and researchers with extensive experience in consensus methods. This was presented to the delegates in addition to specific recommendations for consideration. Feedback was solicited using voting keypads during the 2 plenary sessions, and through discussions throughout the meeting. The research team reviewed both the formal and informal feedback and adapted the recommendations after extensive discussion.
OMERACT Delphi consensus checklist. *The word “Item” refers to a domain or an instrument (outcome measure).
RESULTS
The total number of delegates attending OMERACT 2018 was 170 and included 106 clinicians/researchers (62%), 17 patients (10%), 11 pharmaceutical representatives (7%), 33 fellows (19%), and 3 regulation authorities (2%). Feedback on specific aspects of the Delphi process were sought. Regarding the number of items sent in Delphi surveys, based on previous experience, the consensus team (all authors) recommended including a maximum of 50 items (potential domains) in round 1. The delegates selected 70 as a more realistic number. There is no literature to support either recommendation. In the study by Boulkedid, et al that reported on 80 studies using the Delphi, the initial number of items ranged from 11 to 767, with a median of 599.
When considering what type of feedback should be provided to participants between rounds, a small majority (24/41, 58%) agreed that feedback between rounds should include individuals’ scores for each item and the distribution of votes by participant group. Some, however, preferred to view aggregated feedback (11/41, 27%). The few studies that have formally assessed this have provided mixed results15,16,17.
To provide a feasible minimum number of participant groups and to facilitate the incorporation of patient-relevant outcomes, the consensus team suggested a minimum of 2, including patients and clinicians. In fact, most delegates suggested that more than 3 be selected. After discussions it was felt that apart from patients and clinicians, trialists/researchers should always be involved. Involvement of others would depend on the context of use of the COS. Clearly, OMERACT participants value the involvement of many groups and consider that the selection be dependent on the context7.
There was no consensus on how nonrespondents should be handled. It could be argued they should be excluded from voting in future rounds because they may not be well informed. However, to ensure sufficient numbers of participants for the Delphi, informal discussions led to the conclusion that nonrespondents should be allowed to participate in future rounds at the discretion of the researcher.
Regarding the number of participants in each group, for logistical reasons the consensus team suggested a minimum of 50 participants for each of the 3 predominant groups (patients, clinicians, researchers) at the end of the final Delphi round. When delegates were surveyed, there was a wide distribution of opinions, demonstrating that participants preferred “as many as humanly possible.” Informal discussions revealed delegates were concerned that for some groups, engaging 50 participants may not be a realistic goal, especially for rare diseases and may reduce anonymity. Delegates were surveyed regarding what should be the maximum number of rounds in the Delphi. Votes were divided; 3 rounds were selected by 25/58 (43%), 4 rounds by 16/58 (28%), 5 rounds by 15/58 (26%), and 2/58 (3%) selected 2 or fewer. The recommendation put forth suggests a minimum of 3 rounds. Greater attrition rates with an increasing number of rounds is a concern, but a recent publication demonstrated impressive retention (92%) after 5 rounds as the result of a strategy of tailored reminders by e-mail and telephone18.
Through the Delphi surveys presented at OMERACT 2018, it became apparent that working groups used different ways to prioritize items. Participants were asked to (1) select how important the item was on a rating scale of 1–9; (2) select the top 10 from a long list; or (3) indicate, for instance in the final round, where the item should be [i.e., inner circle (mandatory), middle circle (important but not critical), outer circle (research agenda), or removed (not important)]. This supported the need for further discussion surrounding the definition of consensus. Two-thirds of the delegates agreed that consensus in a COS Delphi should be defined as ≥ 70% of participants in each group voting for the domain as critically important (rating 7–9 on a scale from 1–9). In this case, the domain will be included in the draft core domain set. The advantage is that the voice of various groups (e.g., patients) with fewer numbers can be adequately represented. One third of voting delegates were in favor of a combined definition of consensus, meaning ≥ 70% of all participants should vote the domain as very important, independent of the single groups’ opinion. There is no “correct” definition of consensus, but determining the definition a priori in a manner acceptable to the key groups (i.e., OMERACT) is essential to prevent data mining13.
DISCUSSION
The workshop has increased awareness surrounding the Delphi method and delegates agreed that more standardization is desirable. However, experienced delegates shared concerns that proposing standards that are too proscriptive may be problematic. We therefore suggest that the OMERACT Delphi consensus checklist be used as guidance to working group members. (More detailed recommendations regarding the use of the Delphi Consensus Checklist can be found in the supplementary material, available with the online version of this article.) Because of the lack of literature and empiric evidence regarding the methods themselves, more stringent guidelines are not justified13,15,19. During the meeting, an identified potential tool that may improve the use and reporting of the Delphi is a Delphi software package. Table 16,18 lists considerations when selecting a software package.
Considerations when selecting a software package to administer the Delphi.
Our conclusions:
Awareness regarding the Delphi has been increased;
There is agreement that more standardization is desirable;
OMERACT provides guidance, not absolute standards, because there is insufficient evidence to support decision making;
Standardization may be improved by using software that provides structure and prompts decision making at each stage; and
More research regarding the method itself is needed.
Suggestions moving forward
The updated recommendations accompanying the OMERACT Delphi Consensus Checklist (available with the online version of this article) will be available for OMERACT 2020. Most delegates (74/108, 69%) at the OMERACT 2018 meeting agreed to use this checklist, although some (27/108, 25%) were unsure and some (7/108, 6%) refused. A suggestion to improve uniformity is to use a software program that provides structure and help with reporting all relevant outcomes (e.g., DelphiManager, http://comet-initiative.org/delphimanager/). A majority [69/110 (63%)] of the delegates were willing to use it, 38/110 (34%) were unsure, and 3/110 (3%) refused. To further inform Delphi best practices, we will conduct an internal review of all Delphi surveys done in OMERACT since 2012 and compare them to the Delphi surveys done after the guiding document was made available. Finally, more research is required on the appropriate use of the Delphi method itself.
Limitations
Not all delegates voted at each session, but the response rates represent about 65% of delegates. Our paper is based on the opinions of the authors who have widespread experience with consensus methods, and after extensive consultation with the delegates at OMERACT 2018.
Our paper describes the ongoing strategies to improve processes and procedures surrounding consensus. This work should not be considered the final word, but a step forward as OMERACT continually strives to better itself.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
The authors thank Dr. Paula Williamson for her thoughtful review of the manuscript.
- Accepted for publication December 6, 2018.