Abstract
This workshop reviewed progress in a number of areas related to patient perspective outcomes that were not specifically included within other areas of the program. A substantial review of the work of the valuing health outcomes group (the “QALY” working group) with participation and feedback from the plenary audience resulted in guidance to refocus on the use of patient preferences in the elaboration of more robust outcome measures for patient-reported outcomes and life impact measures. Progress and developments in the areas of fatigue and sleep in rheumatoid arthritis, outcome measures in hip and knee arthroplasty clinical trials, and scleroderma were outlined, and the challenge of truly understanding the nature of clinically important improvement was reviewed.
THE VIRTUAL CAMPUS
The “Patient Perspective Virtual Campus” workshop, held at the Outcome Measures in Rheumatology (OMERACT) 10 meeting, offered the opportunity to review progress in a number of areas related to patient perspective outcomes that were not specifically included within other areas of the program. First, a substantial review of the work of the Valuing Health Outcomes group (the “QALY” working group) was presented, with participation and feedback from the plenary audience. This has resulted in guidance on future directions for this group’s work. Second, brief reports from several other working groups were presented, allowing OMERACT participants to keep up with progress and developments that have built upon and taken forward work at previous OMERACT meetings. Finally, an invited presentation discussed the nature of “clinically important change” and how it might be defined.
SESSION 1
Valuing Health for Clinical and Societal Decisions: Directions Relevant for Rheumatologists (A. Boonen, M.J. Harrison)
Introduction to preference and quality-adjusted life-years (QALY)
The OMERACT QALY working group was formed in 2008 and aimed to achieve consensus on a standard approach to elicitation of QALY for health economic evaluations in rheumatology. A series of research issues were formulated and are commented on in the accompanying report1. An important unresolved issue was raised: Which perspective should OMERACT take when aiming for consensus and standardization on the QALY: the patient’s or society’s? To help OMERACT participants provide an informed discussion and appropriate advice, the workshop started with a practical exercise to demonstrate the power of preference elicitation in contrast to providing simple scores rating outcomes (Table 1). The results provided an immediate illustration of the paradigm: for the purpose of decision-making, preferences are superior to scores to assess the value of goods or services.
Illustration of the superiority of preference over score to assess the value of goods and services.
A poll of attendees on who would take the decision (you, your partner, your children, an adviser, the family by consensus, etc.) provided a wide range of results and illustrated that preferences can be elicited from different points of view.
Issues around estimation of QALY. Perspectives and methods of elicitation
In budget allocation decisions in healthcare, the societal perspective is generally preferred on the assumption that the best decisions are made by those who do not stand to gain or lose from the decision2. The decisions also aim to maximize the health of society; therefore, the use of a societal perspective appears consistent with this aim. Healthcare budget resources are scarce, and use in one setting precludes use in another; therefore, a societal rather than patient perspective is appropriate to value the opportunity cost/benefit of the resource at the broader level. Alternatively, it might be argued that patients know more than anyone about the quality of life they experience, and therefore are best able to value their particular health state. However, this may be at the expense of objectivity; also, the patient may not have the same perception of an optimal state as a person without a given condition2. Patients may adapt to their condition, and expectations of health may change (particularly in patients with a chronic disease) and therefore changes in expectation of “optimal health” can lead to under- or overestimations of the severity of health3. Several studies have shown that patient and physician perspectives can diverge, but few empirical studies have tried to understand such differences4.
Preferences for health may be measured directly or indirectly4. Direct measures such as time tradeoff or standard gamble simulate choice and uncertainty and are completed by a respondent. Indirect measures are based on generic health status measures and have standard weights representative of population preferences that can be attributed to all health profiles described for a measure. Examples of indirect measures are the EuroQol-5D5 and Medical Outcome Study Short-Form 6D (SF-6D)6. Currently, only indirect measures with values from the general population are available, representing the societal perspective, but theoretically, values representative of groups of patients could be developed. A recent review of the measures used in rheumatoid arthritis (RA) showed that direct and indirect measures have been extensively applied in this setting. Indirect measures are generally used, as they can be routinely collected in clinical studies using questionnaires4. The review also found that the most commonly used indirect measure in RA was the EQ-5D, followed by Health Utility Index III (HUI3), SF-6D, Quality of Well-being Scale, and HUI27 (Figure 1).
Number of studies that used each of the utility instruments in studies among patients with rheumatoid arthritis (RA)7. From Harrison, et al. J Rheumatol 2008;35:592–602.
The challenges to reaching consensus on the preferred approach to preference measurement are considerable; not only can different perspectives (patients versus society) lead to different results, but also different approaches within each perspective8. For example, for indirect methods, although there is reasonably good evidence of validity of several of these indirect health status measures (most notably for the EQ-5D, SF-6D, and HUI17), they differ in their absolute values, responsiveness to change, and uncertainty surrounding the scores.
The majority of studies comparing the responsiveness of indirect measures head-to-head have shown that when measuring improvement the SF-6D is more responsive than alternative measures (e.g., compared with the EQ-5D in 3 studies9 or the EQ-5D and HUI310), whereas when measuring deterioration, the EQ-5D is more responsive than the SF-6D9, HUI2, or HUI310. However, in economic evaluation the effectiveness of an intervention is assessed by the mean change in these measures, not their responsiveness. The mean change for these measures can vary markedly; for example, in a study of patients with early polyarthritis receiving glucocorticoid treatment, the mean improvement when measured using the SF-6D was 0.13, compared with 0.20 for the EQ-5D11. These estimates of effectiveness are the denominator in the calculation of QALY (1 QALY = ΔCosts/ΔEffectiveness) and therefore differences in the measure used can have considerable influence on conclusions, as elegantly demonstrated by Marra and colleagues12.
Modeling 10-year EQ-5D, SF-6D, HUI2, and HUI3 outcomes12, estimated using Health Assessment Questionnaire (HAQ) response to infliximab treatment in the ATTRACT trial13, wide variations in QALY gains and subsequent cost per QALY ratios were found according to which measure was used. The SF-6D provided the most pessimistic QALY gains (0.89 QALY) and the HUI3 the most optimistic QALY gains (1.95 QALY) for infliximab treatment. When used to calculate the mean incremental cost-effectiveness ratio, the cost per QALY for inflixmab treatment was US$32K per QALY using the HUI3, US$46K per QALY using the EQ-5D, and US$70K per QALY using SF-6D (more than twice the cost per QALY versus the HUI3 estimate). Variation around the estimates was tested using probabilistic sensitivity analysis — allowing key parameters in the model to vary and assessing the impact of results by rerunning the model thousands of times. Each dot in Figure 2 represents a run of the model, and the tone of the dot denotes the measure of effectiveness used. The diagonal line denotes the line of willingness to pay (WTP) set at US$50 per QALY. Points above this line would not be considered cost-effective, points below the line are considered cost-effective. The treatment would be considered cost-effective if effectiveness were measured by the HUI3 (91% of runs being cost-effective) or the EQ-5D (63% of runs cost-effectiveness), but not using the HUI2 (only 45% of runs cost-effective) or the SF-6D (only 12% of runs cost-effectiveness). This study robustly demonstrates that incremental cost-effectiveness ratios based on different utility measures can differ markedly and should not be used interchangeably. How to choose which measure to use has not been fully explored. The UK National Institute of Health and Clinical Excellence, which evaluates the cost-effectiveness of therapies, currently accepts quality of life evaluations for RA based on mapping of improvements in the HAQ onto the EQ-5D.
Cost-utility plane of an incremental cost-utility analyses in rheumatoid arthritis (RA) using 4 different utility instruments; the Health Utility Index III (HUI3), EuroQol-5D (EQ-5D), HUI2 and Medical Outcome Study Short-Form 6D (SF-6D) illustrating the large differences in point estimates as well as uncertainty12. From Marra, et al. J Clin Epidemiol 2007;60:616–24; with permission.
OMERACT QALY Group: perspective of QALY
In continuing discussions with health economics colleagues within the OMERACT QALY group it has become clear that while patient preferences are related to the needs of outcome research, their role at the level of societal decision remains controversial. Economically oriented researchers felt strongly that only the societal perspective is relevant for decision-making and that the role of disease-specific research in the developments of consensus on a QALY approach was of limited value. In contrast, rheumatology researchers had strong feelings that patient QALY should and could have an increasing role in decision-making, not only at the clinical level but also at the societal level, and that increasing theoretical arguments are available to support this view14,15. Further, some decision-making authorities support the notion that the patients’ perspective has an additional role over and above the societal perspective in drug appraisal16,17.
It is also true that patient-reported outcomes (PRO), which are more descriptive of life impact (Life Impact Measures18), require a greater attention to individual patient preferences for adequate evaluation, as the “importance” of a particular health state will differ between patients, and this will directly affect an assessment of the impact of that state. Thus, the elicitation of patient preferences may have direct relevance to PRO and LIM evaluations.
OMERACT 10 guidance on QALY research
Having gone through these exercises and background presentations, OMERACT participants were given the opportunity to discuss the issues in “buzz groups” for a time, then debate in plenary session the different aspects of the problem. Overall, continuation of QALY research was considered important. Greater understanding of the qualitative and quantitative consequences of defining QALY from different perspectives seemed a particular future challenge. The mini-workshop finished with votes on specific questions. Only 38% of participants felt that continuing work on preference from a societal perspective (societal QALY) would be of interest to OMERACT researchers, but 81% felt preference from a patient perspective (patient QALY) would be of direct interest and relevance for OMERACT outcome research.
As a result of the discussion and consensus at OMERACT 10, the Working Group will therefore refocus on the use of patient preferences in the elaboration of more robust outcome measures for PRO/LIM and invites other interested researchers to join them in this enterprise.
SESSION 2. WORKING GROUP REPORTS
Fatigue in Rheumatoid Arthritis (S. Hewlett)
Work was reported from 3 groups. A trio of single-item scales has been developed and validated to measure RA fatigue severity, coping, and effect (Bristol RA Fatigue visual analog scale or numeric rating scale), as well as a multidimensional questionnaire (Bristol RA Fatigue Multidimensional Questionnaire, BRAF MDQ)19,20. Potential dimensions and wording were grounded in interviews with patients; then patient focus groups formulated draft items, which were clarified through cognitive interviewing. The draft 40-item MDQ was explored in a large cohort of patients, and a robust 20-item MDQ was created. For the first time, the short scales allow an assessment of perceived coping, while the BRAF MDQ provides a global impact score and 4 strong dimensions: physical fatigue (severity), emotional fatigue, cognitive fatigue, and living with fatigue (impact). Reliability and sensitivity studies are continuing. In terms of understanding interventions for fatigue, Cochrane reviews are under way (biologic, pharmacological, and nonpharmacological interventions)21,22,23. A randomized clinical trial has shown that cognitive-behavioral therapy changes not only fatigue severity, impact and coping, but also quality of life, anxiety, depression, pain, sleep, and helplessness24.
Two patient core sets on important treatment outcomes have been developed using different methodologies (patient interviews or literature searching), and in strong collaboration with patients using consensus techniques (nominal groups or Delphi surveys). Both the RA Impact of Disease (RAID) and the RA Patient Priorities for Pharmacological Interventions (RAPP-PI) include fatigue25,26,27. In the 7-item RAID, a weighting exercise by patients identified that pain, function, and fatigue have the strongest importance. Evaluation is continuing for both core sets.
To understand the nature of fatigue better, a qualitative study with patients28 produced data that have been taken forward into Q methodology, a consensus technique that merges qualitative and quantitative data to seek different patterns of fatigue and its effects on people’s lives (Nikolaus S, data in preparation). This is forming the basis of further work to identify those patterns in new scale or item development.
Sleep in RA (G. Wells)
A brief report of continuing work was presented, based on further validation of existing sleep scales for use in RA, as recommended at OMERACT 929. These were the scales considered at OMERACT 9 to show good preliminary evidence of validity in RA. Several publications and further work on sensitivity to change in a large randomized clinical trial of a biologic agent suggests the Medical Outcome Study Sleep Module30 is a sensitive measure that can detect clinically important improvements in sleep31 and is responsive to treatment. The most recent findings are being prepared for publication, and may result in recommendations for the use of particular scales. In this way, it may not be necessary to develop a new RA-specific sleep measure.
Outcome Measures in Hip and Knee Arthroplasty Clinical Trials (J.A. Singh)
Outcome measurement is of critical importance for patients undergoing hip and knee arthroplasty, since most of these are elective procedures (> 80% of hip and > 95% of knee arthroplasties). The primary aim of elective knee and hip arthroplasty is to reduce pain and suffering and improve restricted mobility associated with severe arthritis. These benefits in turn translate into improvement in patients’ daily activities and health-related quality of life (HRQOL)32,33, allowing patients to play their social roles more satisfactorily. Various measures of pain, function, and HRQOL have been developed for assessment of patients undergoing joint replacement and are currently in use. A significant variation has been noted in use of measures of function in clinical trials of patients with hip or knee replacement34. A review of outcomes measures for assessment of HRQOL showed that psychometric properties vary between measures35, with some outcome measures having more validation data than others. Several outcome measures were developed in the 1970s and 1980s, when the use of psychometric theory in development of these measures was not common. This led to the development and adoption of measures now in common practice that have limited validation data, such as the American Knee Society Score and the Harris Hip Score. In contrast, measures developed more recently, such as the Oxford knee or hip score, the Hip dysfunction and Osteoarthritis Outcome Score (HOOS), and Knee injury and Osteoarthritis Outcome Score (KOOS), have validity and reliability data, since psychometric principles were employed in their development. Specifically, these scales are valid, reliable, and sensitive to change. The OMERACT filter of Truth, Discrimination, and Feasibility provides an appropriate framework for developing of new outcome measures and assessing the validity of existing measures for patients undergoing hip or knee replacements.
One of the proposals put forth by the Hip and Knee Replacement Special Interest Group at the OMERACT is to organize meetings of leaders from the outcomes committees of orthopedic societies, clinicians, and measurement science experts to achieve consensus. Such a consensus exercise could help in standardization of use and reporting of these assessment measures in clinical studies, especially in the clinical trials of hip and knee arthroplasty. This proposed meeting should ideally have leaders from orthopedic surgery, physical medicine and rehabilitation, epidemiology and outcomes science, and include a broad international representation. A broad representation across disciplines and regions is needed to avoid the non-uniformity of use of these measures by expertise and by region. An expected outcome of such a consensus meeting would be generation of recommendations regarding which measures to use in patients undergoing knee and hip replacement, in clinical trials, non-trial clinical studies, and for routine clinical care. With recent emphasis on creation of regional and national joint registries in the US and other nations, standardization of outcome measurement is critical. In the absence of standardization, data interpretation will remain difficult. One of the first steps for developing such a consensus is to perform systematic reviews of outcome measures and assessment of their psychometric properties. Our group has laid some foundation in this area by performing a review of common measures used in clinical trials of hip and knee replacement34 and reviewing psychometric properties of common outcome measures for hip and knee replacement35. Similar exercises may need to be undertaken for shoulder, elbow, and other arthroplasties. However, before embarking on systematic reviews in these other areas, we must strive to first achieve consensus for outcome measures for patients undergoing hip and knee replacement. Several venues for such meetings exist, including annual meetings of the American Academy of Orthopedic Surgeons and the American Academy of Hip and Knee Surgeons. Efforts are under way to organize this meeting.
In summary, standardization in choice of instruments and reporting standards for outcomes following hip and knee replacement is needed. With a broad choice of instruments, it will be a challenging, but much needed exercise to optimize reporting across studies and allow comparisons across treatments and interventions.
Measures of Response in Systemic Sclerosis (D. Furst)
Substantial progress was briefly reported. The OMERACT Systemic Sclerosis Working Group members have been working in close collaboration with gastroenterologists and respiratory physicians in applying OMERACT principles to a composite (several body systems) index of systemic sclerosis. This has resulted in a series of recent publications and a wider appreciation of the OMERACT Filter in other areas of medicine36,37,38,39.
SESSION 3
Clinical Importance and the Significance of Changes in Pain (R. Dworkin)
An essential component of the interpretation of results of randomized clinical trials involves the determination of their clinical importance, which involves 2 distinct processes — interpreting the clinical importance of individual patient improvements and of group differences. Unfortunately, the distinction is frequently misunderstood. The clinical importance of patient improvements can be determined by assessing what patients themselves consider meaningful improvement. Such evaluations are necessary to categorize patients as treatment “responders,” and research shows that patients consider pain intensity reductions of at least 2 points (on a scale of 0–10) or 30% to be moderately important40. However, it is crucial to recognize that such criteria for clinically important changes in individuals cannot be extrapolated to the evaluation of group differences. For example, even though a 2-point decrease can be considered a clinically important improvement for individual patients, it should not be concluded that a 2-point difference in mean pain reduction between a pain treatment and placebo is necessary for the treatment benefit to be considered clinically important. The clinical meaningfulness of group differences must be determined by a multifactorial evaluation of the benefits and risks of the treatment and of other available treatments for the condition in light of the primary goals of therapy for the population of patients to be treated41. Such determinations must be conducted on a case-by-case basis, and are ideally informed by patients and their significant others, clinicians, researchers, statisticians, and representatives of society at large.
A second issue, perhaps most important from the patient’s perspective, is the safety and tolerability of a treatment and its ease of use, which are essential components in interpreting both the clinical importance of patient improvements and the clinical importance of group differences. As John Kirwan emphasized in 200142, “if there were no side effects and no costs, then any improvement in symptoms would be worth having — no matter how small.” The therapeutic decision will always depend on the balance in an individual case.
CONCLUSIONS FROM THE VIRTUAL CAMPUS
It is a challenge to keep track of activities that follow on from discussions and decisions at OMERACT meetings when they do not directly feed back to further OMERACT sessions. This activity constitutes the notion of a “virtual campus” that persists outside the biannual meetings. In the past, much of this work has been fed back through sessions within subsequent OMERACT meetings, but the extent of OMERACT-related activities has broadened and not all these groups will have formal sessions. The reports above show how active OMERACT participants have been.
While societal values for different health states are clearly important to the wider field of health economics, the OMERACT QALY group has been given guidance on the way forward, focused more directly on measuring patient preferences for outcome reports within rheumatological studies. The work of other groups, reported above, has spread beyond the definition of relevant outcomes to more general influences on clinical research in their areas of interest. However, the invitation to discuss work from the pain measurement area on clinically important change has reignited the controversy about the real meaning of this concept42, and set down a challenge for those OMERACT groups trying to define “minimal clinically important difference.”