Abstract
Objective. The Outcome Measures in Rheumatology (OMERACT) international consensus initiative has successfully developed core sets of outcome measures for trials of many rheumatologic conditions, but its expanding scope called for clarification and updating of its underlying conceptual framework and working process. To develop a core set of what we propose to call outcome measurement instruments, consensus must be reached both on what to measure and how to measure. This article deals with the first part: a framework necessary to ensure comprehensiveness of the domains chosen for measurement. We formulated a conceptual framework of core measurement areas in clinical trials, for discussion at the OMERACT 11 conference.
Methods. We formulated a framework and definitions of key concepts adapted from the literature, and followed an iterative consensus process (small group processes and an Internet-based survey) of those involved including patients, health professionals, and methodologists within and outside rheumatology.
Results. The draft framework comprises 4 core “areas”: death, life impact (all aspects of how a patient feels or functions), resource use (monetary and other costs of the health condition and interventions), and pathophysiologic manifestations (disease-specific clinical and psychological signs, biomarkers, and potential surrogate outcome measures necessary to assess specific effects). The survey responses (262 of 2293, response rate 11%) indicated broad agreement with the draft framework and the proposed definitions of key concepts, including understandability and feasibility. A total of 283 comments were processed.
Conclusion. In an iterative process, we have developed a generic framework for outcome measurement and working definitions of key concepts ready for discussion at the OMERACT 11 conference.
The Outcome Measures in Rheumatology (OMERACT) initiative formulates, for individual health conditions (for example, rheumatoid arthritis, RA), internationally agreed core sets of outcome measures for randomized controlled trials and longterm observational studies1. These encompass the outcomes that must always be measured to properly assess the harms and benefits of the condition and its treatment. A core set does not replace or define the primary study question, and does not limit the choice of the primary outcome measure. Rather, reporting the results of a core set of outcomes in every trial ensures that a consistent dataset will be available for comparison with other studies, independent of the primary study question and associated outcome measures.
The key to this consensus-based process approach has been to apply the “OMERACT Filter” of “Truth, Discrimination, and Feasibility” to each candidate instrument within each domain of interest (Table 1)2. This pragmatic approach was successful and the definition of truth, discrimination, and feasibility added much clarity, but (perhaps because they were a relatively close-knit set of committed researchers in 1 medical subspecialty) the participants shared many unvoiced assumptions about what to include in core set definition. For example, the notion of “comprehensiveness” (content validity, part of Truth) in RA was based on common clinical experience, not questioning enquiry. OMERACT was implicitly using a framework for content validity based on the work of Fries, et al3 and expanded by Kirwan4, but there was no clear process to determine the comprehensiveness (or other Filter requirements) of the core set as a whole. This common background, while initially beneficial, became problematic as the areas of work expanded. For example, when patient participants were introduced to OMERACT, the comprehensiveness of the RA core set was questioned5, thus highlighting the need for a broader and more transparent conceptual framework and clarification of the protocols used to select outcome measures.
The first step was to search the literature to find existing frameworks that could be used to define outcome within the OMERACT process. A systematic review6 identified several existing conceptual models for health (conditions), the most influential being the World Health Organization (WHO) International Classification of Diseases (ICD)7 and the WHO International Classification of Functioning, Disability, and Health (ICF)8, formerly International Classification of Impairments, Disabilities, and Handicaps9. The ICD lists all known diagnoses and conditions and is grounded in the biomedical model. The ICF is grounded in the integrative and biopsychosocial model and, as the name implies, provides a taxonomy of functioning, disability, and participation. Models that unify these 2 perspectives include the quality of life model developed by Wilson and Cleary10, and those of Bruce and Fries11 and Porter12. None of the above models are fully applicable to OMERACT: they are mostly aimed at describing or classifying health and function, rather than at measuring outcome as a consequence of an intervention. In addition, none were derived from a documented broad consensus over their underlying philosophy (although the ICF was ratified by WHO member nations), each had been promulgated by an individual or a small group, and any subsequent critique was unstructured or undocumented6.
Recently the “COMET” (Core Outcome Measures in Effectiveness Trials) initiative emerged, aiming to bring together researchers of all disease areas interested in the development and application of agreed standardized sets of “core outcomes” to be measured in all clinical trials of a specific condition13. These aims clearly overlap with those of OMERACT within rheumatology, and there has been a wide, consensus-based cross-fertilization between both groups in producing this position paper for consideration and possible further modification at OMERACT 11.
This article describes the development of a conceptual framework of Core Areas defining what to measure to properly and comprehensively describe the effects of intervention on health conditions. This framework was developed for OMERACT but is likely applicable to health conditions outside rheumatology. Key concepts are introduced here (Table 2); they were discussed at the OMERACT 11 conference14; the application of the framework to core set development (i.e., how to measure) is described in other conference articles15,16,17,18. As combined, the framework and its application will be termed the “OMERACT Filter 2.0.”
Method of Development
The first outline of the framework was developed iteratively based on results of an informal literature review and discussions among experts including the OMERACT and COMET executive groups after the first COMET conference in 2010. A more formalized literature review confirmed the lack of an immediately applicable framework and the need to develop one6. Given the lack of a clear alternative, we decided to proceed with the Bruce/Fries/Kirwan work. The authors then developed the next (and subsequent) drafts. Experts in trial and systematic review methodology were identified (n = 53) and invited to comment on the draft by way of targeted survey questions supplemented with open comments. Eighteen of these experts met at the second COMET conference (July 2011) for a structured discussion of the feedback received by then. The draft version of the framework was formally presented in a plenary session at the conference. Participants were invited to submit comments and suggestions. The framework, now written in the form of a draft paper with definitions of key concepts (Table 2), was further refined. This document was sent along with a reviewer survey to a total of 2293 persons: all participants from the second COMET conference (n = 131), all current and former OMERACT conference participants (n = 678), and participants of the Evidence Based Health listserv (n = 1484)19. A total of 262 surveys were returned (161 of these were from OMERACT participants; overall response: 11%). The survey responses indicated broad agreement with the draft framework and the proposed definitions of key concepts, including its understandability and feasibility (Supplementary Table 1 available online at jrheum.org). A total of 283 comments were processed.
The definition of the key concepts and the framework, both supplemented by the comments and explanations as set out in this position paper, were presented at OMERACT 11. The subsequent discussions and eventual conclusions from OMERACT 11 are reported elsewhere14 except where the definition of key concepts has changed, and the final concepts are reported here for clarity (Table 2).
Proposed Framework and Elaboration (see also Figure 1)
The framework guides core set development in the setting of all trials aimed at assessing benefit or harm in humans, i.e., all trials from phase 2. A large majority of respondents supported use of the framework in systematic reviews and observational studies (each, 84%), but not in audits (51%) or clinical care (61%). Systematic reviews follow naturally from trials, but application to observational studies is outside the scope of the current development. The framework encompasses the effect and the pathophysiological manifestations of health conditions.
Impact of health conditions
Impact of health conditions includes all aspects that describe how a patient feels, functions, or survives, covering 3 areas: Death; Life Impact of Health Conditions; and Resource Use. Life Impact can also be described as “the lived experience of health”20,21. Resource use is of paramount importance to society, and can be regarded as a reflection of the societal effect of a health condition, and can also relate to the personal resources of all kinds invested by patients and caregivers in their health.
Pathophysiological manifestations of health conditions
This grouping includes reversible and irreversible (damage) manifestations. Reversible manifestations can be either modifiable risk factors for a health condition (in the setting of prevention trials, e.g., hypertension), or actual manifestations of the health condition (e.g., RA disease activity, glycosylated hemoglobin A levels), as in disease activity. These are a Core Area because clinical trials are done not only to assess effects (benefit and harm) of an intervention, but also to document whether the effect of the intervention specifically targets the pathophysiology of the health condition. In the original Bruce and Fries framework11 as adapted by Kirwan4 this was termed “Process” and we adopted this term in initial drafts. However, Donabedian used the word “process” in the context of measurement of quality of care, where the word denoted process of care, and feedback from surveys confirmed the potential for confusion, so we adopted the current term22.
Elaboration
Death
Where the importance of death in studies of life-threatening conditions is obvious, its inclusion in studies of other conditions resulted in much discussion. All participants agreed death should always be reported even where it is a rare occurrence, making it a core area by default. In life-threatening conditions, death would probably be specified in a core domain set and may even be specified by study developers as the primary outcome measure.
Impact of Health Conditions
The term “Life Impact of Health Conditions” represents what previously was called “burden of disease.” This concept is largely covered by health-related quality of life, but also includes life activities, participation, etc. It mostly links with 2 parts of ICF: activities, and participation23. Many respondents felt this area was (too) broad and needed further specification in the framework. However, these suggestions seemed to be related to the professional background of the respondent, so while providing some examples we prefer to leave such specification to core set developers who can consider Life Impact in their chosen context.
The time-specific nature of domain specification may be very important in the Area of Impact, but also in the other proposed areas. In Porter’s “outcome hierarchy to assess value of health care”12, time considerations are essential. This is also gaining in importance in chronic diseases, where in some areas (such as rheumatology), agents have been developed that have a much more rapid onset of action than traditional treatments. Core set developers will need to explicitly consider how the health condition interacts with the intervention to decide how the effect is best expressed. In chronic health conditions, measures that capture the experience over time in more detail (e.g., “area under the curve” of effect) may be preferable to single end-of-trial assessments. To measure effect, it is important to define the concepts of “(minimum) clinically important change” (to detect a useful response) and “patient acceptable state” for the outcome measures of choice24,25.
Although not included explicitly, the concept of impact includes not only negative but also positive effects, which are more appropriately termed effects on the “amount of health.” In this way, measurement of interventions that increase health above the norm can also be placed in the framework; for example, the tradeoffs faced by athletes training for the Olympics, such as the social isolation during training. Currently, the Medical Outcome Study Short Form-36 survey questionnaire is one of the few generic scales to include items on positive health26.
Resource Use
In economics, resources are defined as inputs required to produce goods and services. These include both tangible factors such as monetary capital and labor, and intangible factors such as opportunity. Both the presence of a health condition and its treatment incur resource use. Most of these are expressed as monetary costs, including costs of the intervention itself, associated costs involved in its application (including costs of treating side effects), and indirect costs associated with productivity loss. The consideration of resource use at the earliest stage of the development of a health intervention has become of paramount importance in recent years, as even the budgets of the richest nations are being threatened by burgeoning healthcare costs. In the developing world, apart from societal and geographical contextual factors discussed below, the applicability of an intervention is strongly driven by its associated resource use. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system for developing recommendations includes costs as a key component in formulating clinical recommendations27. For health economic evaluations, the valuation of health states (utility) is essential. Such valuation is simply another way to measure impact, with death usually valued at zero. While a strong majority favored Resource Use as an important area to assess, many comments were made as to whether this area should always be reported; hence it would be an important discussion point at the OMERACT conference.
Pathophysiological manifestations of health conditions
Despite the recent drive toward measurement of “patient-relevant” outcomes, we argue that in the context of trials, measures of pathophysiological manifestations (PM) should constitute a Core Area alongside Impact. PM is a broad term that includes mostly disease-specific clinical signs, biomarkers28, and potential surrogate outcome measures29 necessary to assess specific effects. It can also include psychological manifestations and mostly overlaps with the ICF concepts of body structures and body function. The alignment is not perfect: for instance, we place symptoms such as pain, stiffness, and fatigue under Life Impact, but these are categorized as body functions under ICF. PM are needed to assess whether the effect of the intervention specifically targets the pathophysiology of the health condition, or alternatively, is nonspecific and unable to change the course of the disease.
PM includes most measures currently being used in trials as primary outcome measures (e.g., forced expiratory volume, tumor response assessed on imaging, most measures of RA disease activity, damage). Many adverse events can be classified as PM (mostly those detected only by biomarkers such as laboratory tests, e.g., liver function abnormalities). PM can sometimes serve as first indication for Impact when this is difficult to measure in the context of a trial (for example, RA joint damage for longterm work disability). OMERACT has developed a framework to help distinguish biomarkers (any PM deemed useful) from surrogate outcome measures (modifiable biomarkers that predict later Impact)29.
In prevention trials, we perceive risk factors as being surrogates for (usually bad) outcomes that need to be prevented. Note that even in the preventive setting, people at risk of a health condition may experience a real change in their effect of ill health when they become aware of the risk and change their behavior accordingly. We expect core set developers to be most familiar with the PM area. Most PM will be specific to the health condition being studied, and the formulation of what the most important manifestations are will be relatively straightforward. Most discussions will then focus on the choice of instruments.
Adverse events
In most trials the (intended) beneficial effects are studied in much greater detail than the (unintended) harmful effects. The latter are usually only listed and summarized. Integration of benefit and harm into 1 scale would help to determine the total effect of an intervention compared to its alternatives. The development of such a scale needs to overcome several conceptual and practical hurdles30. One possibility is the use of utility measurement or quality-adjusted life years: it is assumed that all positive and negative effects are covered under the valuation of a health state. The Cochrane Collaboration Systematic Reviews now require that up to a maximum of 7 of the most important benefits and harms be included in their main summary of findings tables31, and the GRADE approach to clinical recommendations requires explicit assessment of the tradeoff between benefit and harm27. Although adverse events will be measured in one of the core areas and were initially regarded as a domain within these, many respondents felt adverse events should have a recognizable and separate place in the framework, leaving room for further discussion.
Setting and contextual factors
The setting or scope (health condition, target population for the intervention, type of intervention, etc.) for which the core set is being developed will drive most of the deliberations. Contextual factors can be defined as those that are not the primary object of research but that may influence the results or the interpretation of the results. In addition, there are potential confounders and effect modifiers (most of which should be eliminated by randomization), as well as factors that define the generalizability of the study findings. Broadly speaking, contextual factors can be classified as personal (e.g., sociodemographics, comorbidity), societal (e.g., ethnicity, cultural attitudes, traditions) and environmental (e.g., geography). The latter 2 categories are increasingly important as research has become global. Core set developers need to specify the setting, the minimum list of contextual factors to be documented, and also whether such factors are appropriately to be measured at baseline or repeatedly. The specification of contextual factors emerged as an area of contention in both survey and OMERACT 11 participants.
Overlap in classifications
The precise selection of domains within the Core Areas will always depend on the setting of the trial as defined by those developing core outcome measurement sets. In some settings a particular variable will be a primary outcome, in others part of the core outcome measurement set, and in others the same variable may be a contextual factor. For example, adherence to treatment can strongly influence the result of a trial, e.g., whether an intervention reduces disease activity. In some cases adherence can become the target of intervention, and disease activity (or its change) can become the contextual factor.
Discussion
Building on past work, the presented framework for Core Domain Sets unifies the biomedical and quality of life model and remains consistent with other models, particularly the ICD and ICF. The framework is being developed in the setting of trials of health interventions. It closely follows the seminal work by Bruce and Fries in the development of the Health Assessment Questionnaire11, in turn inspired by Donabedian22 and others. Its initial application is for randomized trials, but an obvious extension is to observational studies where many if not all the principles underpinning the framework apply.
In the absence of any gold standard, the most important aspect of the framework is its face validity; i.e., it must be acceptable to everyone involved. Therefore, we believe that it is appropriate to follow a consensus process to develop and present this preliminary framework, engaging a wide range of viewpoints including those of patients, caregivers, healthcare providers, researchers, healthcare managers, payers, industry, regulators, and the government. Further, we seek to be explicit and document our stepwise consensus process. Finally, identifying the areas of disagreement can inform what needs to be further developed.
Our survey intended to reach out to a wide audience but many of the people approached (especially from the Evidence Based Health listserv19) chose not to respond. It is likely most were not directly engaged in the topic, but the low overall response rate precludes firm conclusions on the acceptability of the framework in the wider scientific community at this stage. Therefore, the proposals in this position paper, although building on existing models and already subject to extensive discussion, were placed before OMERACT 11 for intense scrutiny and consideration for acceptance as part of the development of OMERACT Filter 2.0. [Note that a standalone article intended for a general (non-rheumatology) audience will appear in the Journal of Clinical Epidemiology; it summarizes development of Filter 2.0, described in detail in this part of the OMERACT proceedings32.]
Acknowledgment
We thank Paula Williamson and Jane Blazeby of the COMET executive, and all members of the OMERACT executive, for valuable advice; and survey participants for their input.
REFERENCES
ONLINE SUPPLEMENT
Supplementary data for this article are available online at jrheum.org