Abstract
Objective. The usefulness of randomized control trials to advance clinical care depends upon the outcomes reported, but disagreement on the choice of outcome measures has resulted in inconsistency and the potential for reporting bias. One solution to this problem is the development of a core outcome set: a minimum set of outcome measures deemed critical for clinical decision making. Within rheumatology the Outcome Measures in Rheumatology (OMERACT) initiative has pioneered the development of core outcome sets since 1992. As the number of diseases addressed by OMERACT has increased and its experience in formulating core sets has grown, clarification and update of the conceptual framework and formulation of a more explicit process of area/domain core set development has become necessary. As part of the update process of the OMERACT Filter criteria to version 2, a literature review was undertaken to compare and contrast the OMERACT conceptual framework with others within and outside rheumatology.
Methods. A scoping search was undertaken to examine the extent, range, and nature of conceptual frameworks for core set outcome selection in health. We searched the following resources: Cochrane Library Methods Group Register; Medline; Embase; PsycInfo; Environmental Studies and Policy Collection; and ABI/INFORM Global. We also conducted a targeted Google search.
Results. Five conceptual frameworks were identified: the WHO tripartite definition of health; the 5 Ds (discomfort, disability, drug toxicity, dollar cost, and death); the International Classification of Functioning (ICF); PROMIS (Patient-Reported Outcomes Measurement System); and the Outcomes Hierarchy. Of these, only the 5 Ds and ICF frameworks have been systematically applied in core set development. Outside the area of rheumatology, several core sets were identified; these had been developed through a limited range of consensus-based methods with varying degrees of methodological rigor. None applied a framework to ensure content validity of the end product.
Conclusion. This scoping review reinforced the need for clear methods and standards for core set development. Based on these findings, OMERACT will make its own conceptual framework and working process more explicit. Proposals for how to achieve this were discussed at the OMERACT 11 conference.
The usefulness of randomized control trials (RCT) to advance clinical care and improve health depends upon the outcomes reported, but disagreement on the choice of measurement domains and instruments has resulted in inconsistency and the potential for reporting bias1. Development of a “core outcome set,” a minimum set of outcome measures deemed critical for clinical decision making and that must be reported in all RCT in a given health condition or class of intervention, could solve this problem.
Within the field of rheumatology, the Outcome Measures in Rheumatology (OMERACT) initiative has pioneered the development of core outcome sets since 1992. The key characteristic of OMERACT is a commitment to a data-driven, iterative development of feasible health outcome measures in patients with musculoskeletal conditions. This process involves multiple groups including patients, clinicians, researchers, approval agencies, and industry.
To date, 8 OMERACT core sets have been established for the following conditions: ankylosing spondylitis, fibromyalgia, gout (acute and chronic), osteoarthritis, osteoporosis, psoriatic arthritis, rheumatoid arthritis (RA), and systemic lupus erythematosus (domains only)2. The OMERACT consensus process has focused on testing the applicability of selected instruments by applying the “OMERACT Filter” of Truth, Discrimination, and Feasibility to each candidate instrument (Table 1)3. As the number of diseases addressed by OMERACT has increased and its experience in formulating core sets has grown, clarification and update of the conceptual framework and formulation of a more explicit process of area/domain core set development has become necessary. Several frameworks to study health, diseases, and their consequences have been suggested4, but their applicability for core set development is unclear. Scoping searches are a useful way of mapping fields of study where it is difficult to visualize the range of material that might be available. This literature review therefore sought, through a scoping search, to answer the questions: what conceptual frameworks have been used to develop core sets in health? What were the development processes for core sets outside rheumatology and how do they compare to the OMERACT approach?
MATERIALS AND METHODS
A scoping literature search was undertaken to examine the extent, range, and nature of conceptual frameworks for core set outcome selection in health. This search included peer-reviewed studies and conference proceedings from 1980 to December 2011 and was conducted in 4 steps: a systematic search of electronic literature databases, a purposeful Web-based search using Google, direct enquiry of experts in the field, and scrutiny of reference lists of key studies to increase the retrieval of relevant material5.
For the first step, a sensitive search strategy was designed to retrieve all articles combining the concepts of “domain selection” and “outcomes research” from electronic bibliographic databases. A study design filter was not applied so as to retrieve qualitative as well as quantitative papers. The search was not limited by language. The search strategy was devised on OVID Medline and then adapted for the other databases (Appendix 1). We searched the following electronic databases: Cochrane Library Methods Group Register, Medline, Embase, PsycInfo, Environmental Studies and Policy Collection, and ABI/INFORM Global.
For the second step, a Google search using variations of the search terms (Appendix 1) was conducted and the first 50 results from each search were reviewed for more detailed information. The cutoff of the first 50 results was chosen because the relevancy of the results dropped off beyond this number. Also, although Google searches are not suitable for replication, we wish to be explicit on how we searched this resource. We also asked experts in the field to identify important articles. Reference lists of key studies were also scanned to increase the retrieval of relevant material5.
Studies were screened by 2 authors (TR and LI), and all studies were included that revealed methodology on how core domains for outcome measures could be established or contained recommendations for core outcome domain selection. Descriptive data and the conceptual methodology used in each study was extracted by 1 author (LI) and subsequently reviewed by all authors. A summary and illustrative diagram of each conceptual framework was prepared, and the data on its development, use, and validation were collated. The conceptual frameworks are presented in the order they were developed.
RESULTS
After duplicates were removed, 2438 articles were screened. We identified 5 conceptual frameworks. Core sets for standardized measurement have been developed in a number of disease areas, and we found 60 of them: 34 International Classification of Functioning, Disability and Health (ICF) core sets, 8 OMERACT core sets, and 18 other medical core sets (see below). Most core sets do not make a clear distinction between domains of measurement and specific measurement instruments to be used within those domains, and can thus be better described as “core domain sets,” with suggestions for appropriate instruments in some cases. Although originating in different medical disciplines, in the majority a very similar approach was used: a combination of literature searches and expert group consensus. The 5 conceptual frameworks and the development of core sets within OMERACT, the ICF, and other medical conditions are described below by order of first appearance.
Development of Frameworks
Framework 1: World Health Organization (WHO) definition of health
The WHO definition of health can be considered as a conceptual framework for health and asserts that health is, “A state of complete physical, mental and social well-being and not merely the absence of disease or infirmity”6. This conceptual framework is visualized in Figure 1. The WHO definition is extremely useful as a starting point for defining health. However, although this framework is widely quoted and referred to, we did not find an explicit description of the process by which these domains were formulated. Further, while the 3 domains are clearly outlined, they are very broad, and this framework does not provide guidance on what should be measured within each of the well-being domains.
Framework 2: The 5 Ds
The conceptual framework known as the 5 Ds, referring to discomfort, disability, drug toxicity, dollar cost, and death7 (Figure 2), was suggested by Fries for selecting domains of patient welfare. Although this conceptual framework is often cited, we did not find an explicit methodology describing how these 5 areas were established. The framework reflects the priority views of investigators at 17 centers comprising the Arthritis, Rheumatism and Aging Medical Information System (ARAMIS) and represents what they considered to be a patient-oriented system that would provide multidimensional data on key aspects of health (J Fries, personal communication). The 5-D framework seems to have guided much thinking and work related to outcome measures, particularly in relation to RA. To date, it forms the implicit basis of domain selection for OMERACT, and has been used to validate questionnaires7,8. However, it has not been explicitly discussed or endorsed at OMERACT.
Framework 3: International Classification of Functioning, Disability and Health
The ICF is focused on the consequences of loss of health for the individual, and is grounded in the biopsychosocial model, focusing on the burden or effect of disease and on impairment (in terms of functioning and disability) rather than the causes of disease. One objective of the ICF project is the development of internationally agreed on core sets that include aspects of both functioning and disability. As Figure 3 indicates, there are 3 levels of human functioning classified by ICF: functioning at the level of body or body part, the whole person, and the whole person in a social context9.
The ICF was established in 2001 and was founded on the earlier International Classification of Impairments, Disabilities and Handicaps (ICIDH; WHO 1980; WHO 2011). The ICIDH was developed between 1972 and 1976 by a small group of experts and was printed in 198010. In 1993 a process of revision was started and resulted in ICIDH-2. The revision began with French, Dutch, and North American collaborating centers each taking responsibility for a section. Following a meeting in 1996 where each center presented their work, a draft of the ICIDH-2 was circulated to all the collaborating centers, with the WHO headquarters taking responsibility for the collation of comments. In 1997, field trials were conducted to reach consensus on definitions. The field trials were conducted with the widest possible participation from WHO member states and across a variety of sectors, including health insurance, social security, labor, and education. Following multiple revisions, the final version was published with the title “International Classification of Functioning, Disability and Health” in May 200110. Thus selection of core areas within the ICF was based on expert consensus followed by multiple international consultations and refinements.
The methods to develop ICF core sets involve a formal decision-making and consensus process integrating evidence gathered from preliminary studies and expert opinion. The goal is the development of valid and globally agreed measurement instruments to be used in clinical practice, research, and health statistics. Although this framework is increasingly being used to establish core domains within diseases, we were not able to find an explicit description of the methodology for identifying and validating the domains, beyond statements that there was a process for formal decision-making and consensus process integrating evidence gathered from preliminary studies and expert opinion.
Framework 4: Patient-Reported Outcomes Measurement Information System (PROMIS)
The PROMIS framework splits self-reported health into 3 separate areas based on the WHO definition of health: physical health, mental health, and social health11. As illustrated in Figure 4, each of these areas is split into domains. Physical health comprises symptoms and functions, mental health comprises affect, behavior, and cognition, and social health comprises relationships and function. These domains are arranged in a hierarchy, which is a branching diagram, with self-reported health at the top and ever finer subdivisions of health on each lower tier. The levels are upward compatible, because more specific domains from lower levels can always be folded up into more conceptual levels above. This compatibility goes only upward, for example, walking can be a subset of mobility, which can be a subset of physical function, which can be a subset of physical health, which can be a subset of health. This does not pertain in reverse, hence the term “hierarchy.” Domains making up a particular level should be as mutually exclusive and collectively exhaustive as possible (J Fries, personal communication, October 2012).
The PROMIS initiative began in 2004 when a group of outcomes researchers from 7 institutions formed a cooperative funded by the US National Institutes of Health. This initiative aimed to transform the way patient-reported outcome tools are selected and employed in clinical research and practice evaluation12. The first task of the PROMIS network was to create a protocol for developing a domain framework. Conceptual frameworks such as the WHO health framework and ICF framework were considered as the basis for this domain hierarchy. After discussion and careful consideration the steering committee decided to retain the WHO tripartite framework. After achieving consensus on the broad WHO framework, the preliminary PROMIS framework was developed through independent literature reviews followed by a consensus-building Delphi process and statistical analysis of available data. PROMIS network investigators used a modified Delphi approach combined with quantitative analysis of existing relevant data, to inform multiple rounds of framework review and revision until consensus was reached on the core domains beneath the broad physical, mental, and social headings11,12.
The instruments based on the item banks that make up the PROMIS system are being extensively validated in a wide range of populations and conditions. So far, PROMIS has not been used explicitly for core set development, although the production of tailored questionnaires is one way PROMIS effectively uses the framework for core set development11.
Framework 5: Porter’s Outcome Hierarchy
Porter13 presented a conceptual framework based on a 3-tiered hierarchy. As can be seen in Figure 5, each tier of the hierarchy contains 2 dimensions that represent specific aspects of patient health. Tier 1, considered to be the most important to patients, is termed “Health status achieved or retained.” The 2 dimensions measured are survival and degree of health/recovery. Tier 2 refers to the “process of recovery.” The 2 dimensions measured are time to recovery (or time to return to normal activities) and disutility (= negative value) of care or a treatment process (including medical errors and adverse effects). Tier 3 refers to the “sustainability of health.” The 2 dimensions measured are the degree of health maintained as well as new health problems created as a result of treatment. Success within each dimension is measured with specific instruments. Each instrument has a timing and frequency with which it needs to be measured.
According to this framework, each medical condition should select its own unique set of outcome measures depending, among other things, on the variety of treatment options, the range of complications of the disease and its treatment, and the duration of care.
The outcome hierarchy framework was developed by Porter with input from a number of colleagues and clinicians (Szela, personal communication, 2012). It relies on existing measurement tools, but uses a specific structure to emphasize a comprehensive outcome hierarchy and identify missing measures. The framework has been applied to a number of medical conditions across many areas of medicine by experts in many specialty organizations. By asking specialty organizations, registries, and thought leaders to apply the outcome hierarchy to their fields and provide feedback, the authors feel they have in effect established that the framework can be validly applied to almost all areas of medicine (Szela personal communication, 2012).
Development of Core Sets
Core set development in OMERACT
As stated, to date the 5-D framework has been used to develop core sets within OMERACT. Eight OMERACT core sets have been established over the last 20 years. The detailed methodology for each of these core sets is found in Table 1. The working groups within OMERACT have followed variations of the following basic methodology: Literature search for all outcomes used; preliminary list of core domains developed; nominal group technique (bringing together all those involved); weighting of selected domains; responsiveness of endpoints evaluated; and consensus vote for inclusion in core set. A consistent element in all core sets is the final vote for inclusion. All core domains and instruments must “pass” (that is, satisfy the requirements of) the OMERACT Filter. Proposed core sets are presented and voted on at the final plenary session of each OMERACT meeting, and if 70% or more of participants vote in favor, a domain is included in the core set. In effect, it is a set of outcome measures that is endorsed in this way when they pass the OMERACT Filter because the 3 components of Truth, Discrimination, and Feasibility are validity tests of measurement instruments in their intended settings3. Truth includes the issues of face, content, construct, and criterion validity. Discrimination includes the issues of reliability and sensitivity to change, and Feasibility addresses the pragmatic reality of the use of a measure, one that may be decisive in determining a measure’s success. Thus the OMERACT Filter was formulated to summarize in memorable phrases the important aspects of instrument validation adopted from psychometrics by Tugwell and Bombardier in 198214. It was designed to be used to assess the applicability of instruments within domains.
OMERACT conferences take place every 2 years, and discussions center around measurement topics and diseases of interest prepared by groups of experts. Most topics are discussed in workshop format, where the aim is to make explicit the areas of agreement and disagreement, and to prioritize the research agenda. Groups of patients with a rheumatic condition selected and trained in measurement have participated in OMERACT conferences and working groups since 2002 to identify what is important to them, and to ensure that these issues are addressed within the chosen measurement instruments.
Core set development in ICF
Currently, there are 34 ICF core sets in various health conditions and settings including neurology, cardiopulmonary, cancer, mental health, musculoskeletal conditions, sleep, inflammatory bowel disease, hand conditions, and vocational rehabilitation15. These core sets have been developed according to a generic procedure16,17, although some have been further modified. Briefly, 4 preparatory studies are undertaken to identify relevant ICF-based domains (in the form of ICF categories) in a specific health condition or setting using various methodologies: review of the literature, expert survey (e.g., Delphi), patient interview (e.g., individual or focus group), and multicenter cross-sectional study. These candidate ICF categories are then presented to invited international experts in a consensus conference using a multistage iterative consensus process and nominal group technique, which ultimately generate 2 versions of the core set: brief and comprehensive. Brief ICF core set refers to the minimal list of categories that need to be assessed in unidisciplinary settings or in studies or trials, while the comprehensive ICF core set is intended for use in a comprehensive or multi-disciplinary setting. These core sets are more appropriately termed core domain sets, as they do not specify instruments to measure each of the selected domains.
Core set development in other medical disciplines
A total of 18 other core sets were identified. Of those, 11 used only expert consensus. A detailed breakdown for each of those studies is provided in Table 2. The expert consensus studies were varied and ranged from professionals discussing the acceptability of endpoints to a 5-round Delphi process with expert groups. Within this subsection of studies only 4 conducted a literature review seeking any core areas that were currently in use. The remaining 7 core sets used a variety of methods as shown in Table 3. Five studies conducted a literature search in combination with other methods and one study conducted only a literature search. Six studies used some form of nominal group technique with various people involved to define the core sets.
DISCUSSION
This scoping review identified 5 conceptual frameworks relevant for core set development: the WHO tripartite definition of health, the 5 Ds, ICF, PROMIS, and the Outcomes Hierarchy. Of these, only the 5 Ds and ICF frameworks have been systematically applied in core set development. We were unable to identify any explicit report of the development process of these frameworks. A strength of the review is that the search had a wide scope, including many sources of potentially useful publications. As a first of its kind, its most important limitation is the lack of good examples. Specifically, the lack of good definitions for many of the terms and concepts and the fluidity of nomenclature is problematic, and it is possible that despite our extensive search strategy we have overlooked 1 or more conceptual frameworks. We invite readers who are aware of other frameworks to contact us about them.
In practice, OMERACT has selected domains for core sets with an implicit understanding of the 5-D framework. However, 5 Ds does not include measurement of pathophysiology, arguably an important area of measurement, which is required to understand why an intervention is (or is not) working as intended. For this, an extension along the lines suggested by Kirwan is necessary18. None of the other frameworks appear immediately applicable to the aims of OMERACT: ICF is focused on functioning and does not intend to fully cover all areas of outcome; PROMIS focuses much more on instrument selection than identification of core domains; and Porter’s hierarchy framework is strongly focused on time, more appropriate for acute and reversible conditions than for chronic disease. In other areas of medicine, existing core sets were developed through a limited range of consensus-based methods, with varying degrees of methodological rigor. None applied a conceptual framework to ensure content validity of the end product.
Although it was our intent to find conceptual frameworks, we identified several core sets outside of OMERACT that were developed using a variety of methods. After this work was completed, we came into contact with researchers from the COMET group (Core Outcome Measures in Effectiveness Trials, see www.comet-initiative.org) who are performing a thorough review of all core sets published to date.
This review did not find frameworks that appeared immediately applicable to core set development, nor process descriptions that would improve the development process of core sets within OMERACT. Based in part on these findings, OMERACT decided to make more explicit its own conceptual framework and working process, discussed at the OMERACT 11 conference2.