Abstract
The workshop Choosing or Developing Instruments held at the Outcome Measures in Rheumatology (OMERACT) 10 meeting was designed to help participants think about the underlying methods of instrument development. Conference pre-reading material and 3 brief introductory presentations elaborated the issues, and participants broke into discussion groups before reconvening to share insights, engage in a more general discussion of the issues, and vote on recommendations. Tradeoffs between using current imperfect measures and the long and complex process of developing new instruments were considered, together with the need for rigor in patient-reported outcome (PRO) instrument development. The main considerations for PRO instrument development were listed and a research agenda for action produced. As part of the agenda for action, it is recommended that researchers and patient partners work together to tackle these issues, and that OMERACT bring forward proposals for acceptable instrument development protocols that would meet an enhanced “Truth” statement in the OMERACT Filter.
AIM
Previous OMERACT meetings have focused on testing outcome measures once they are available. The Choosing or Developing Instruments Workshop at OMERACT 10 was one of 4 linked “Patient Perspective” workshops with the collective aim of turning attention towards guiding initial development of outcome measures to ensure they include or capture the patient’s perspective. It was designed to help participants think about the underlying methods of instrument development that are common to measuring patient perspective in different diseases and different settings. We sought to derive a set of principles or “quality standards” that will be applicable in general and thence to use these principles to approve future outcome measures.
METHODS
Pre-reading material
Conference pre-reading material1 reviewed development of the patient perspective on outcome assessment at OMERACT over recent years. It drew participants’ attention to the need to consider the patient-reported outcome (PRO) instrument development process, as well as post-development performance testing. Three brief introductory presentations elaborated these issues, and then participants broke into discussion groups before reconvening to share insights, engage in a more general discussion of the issues, and vote on some recommendations. The main conclusions and recommendations from the workshop were taken to the final OMERACT plenary session for consideration in conjunction with those from the other patient perspective workshops, and to seek endorsement for the next step forward2.
Introductory presentations
In the introductory presentations, J. Fries discussed the innovative approach to instrument development taken by the patient-reported outcome information system (PROMIS) organization3, and the potential for this large item bank to be used efficiently following evaluation by item response theory (IRT) methodology, and particularly in combination with computer adaptive testing4. Using the example of physical function, it was shown that floor and ceiling effects were substantially reduced compared to current instruments widely used in rheumatology, such as the Health Assessment Questionnaire. Fries suggested that the time had now come to use the PROMIS system directly in clinical trials and compare its performance directly to current PRO outcome instruments.
S. Hewlett reviewed the development of the Bristol Rheumatoid Arthritis Fatigue (BRAF) scales, a suite of 3 visual analog scales (VAS) and a 20-item questionnaire that contains 4 robust independent dimensions. The intense iterative process involving patients as both participants/subjects and research partners was described, and the specific benefits of this interaction and changes incorporated into the questionnaire as a result were enumerated5. In particular, the development of both detailed question wording and the overall structure of the questionnaire was influenced by a combination of different types of patient engagement, such as participants thinking aloud as they completed the questions, and a patient research partner gauging the overall flow of the questions and how that might affect patients’ readiness to complete the whole questionnaire6.
R.H. Osborne presented a digest of classical and modern approaches to instrument development. He emphasized the importance and challenge of the first step in questionnaire development: ensuring that both the questionnaire items and the construct being measured reflect, as far as possible, the patients’ and clinicians’ experiences. Structured approaches to item and construct development were discussed. These include well considered evidence-based item writing rules using grounded nominal group consultation techniques. Reflective clinicians and patients from the “coal face of care” should be consulted to ensure content and face validity7. Osborne recommended the use of Rasch analysis and confirmatory factor analysis in a 2-step process for item selection and scale construction. Consultation with other stakeholders, such as commissioners and managers of healthcare, was also highlighted as an important element in establishing what needs to be measured and how. Examples of the processes were drawn from the development of the multidimensional patient-centered questionnaire that measures the impact osteoarthritis (OA) has on individuals (called the OA Quest8) and the Multi-attribute Prioritisation Tool for hip and knee joint replacement surgery9.
Breakout group presentations
Breakout groups, prior to their own discussions, first listened to one of 4 mini-presentations used to illustrate some of the methods mentioned in the main presentations. In one such presentation an approach to the development of a robust and accurate patient-reported “uncertainty” outcome measure to be validated and applied in systemic lupus erythematosus (SLE) was reported by S. Newman. Feelings of uncertainty have been reported to hinder the emotional adjustment in general10 and in SLE patients in particular11,12,13,14. Nevertheless, uncertainty has never been explored empirically in rheumatology because of the lack of both a conceptualization of uncertainty and hence an absence of an adequate and valid tool to measure it15. Newman and colleagues aim to address both issues in developing a PRO measure that will reflect the patients’ perspective of the uncertainty experience.
To broaden the investigation of uncertainty and provide a contrast to SLE, patients with rheumatoid arthritis (RA) were also recruited as they have also reported experiencing uncertainty issues16,17. The systematic approach to development of the instrument involves 3 phases. First, a qualitative investigation has been conducted including a review of the literature across chronic conditions, consultation with rheumatology healthcare professionals (HCP; n = 8) to explore their understanding of uncertainty issues in SLE, and in-depth, semi-structured qualitative interviews with SLE patients (n = 17) and RA patients (n = 15). Transcripts of audiotaped interviews are being coded for uncertainty themes and these will be combined with information from the literature review and HCP opinion to develop a conceptual model. It is intended that item generation for the uncertainty PRO will be conducted on the basis of the conceptual model and wherever possible the words of the patients will be used in the PRO items. Further development will include a postal survey of 400 patients for statistical validation, including thresholds for item response options; item fit statistics; item locations; person separation index and traditional psychometric analyses including standard tests for reliability and validity.
Finally, a cross-sectional cohort study will be conducted using an SLE sample (n = 200) to further explore the construct validity of the new uncertainty PRO in comparison with existing related measures and measures of quality of life and general well-being.
S. Ciciriello reported on the use of concept mapping workshops18 with patients that incorporate nominal group techniques19 to inform development of new questionnaires. The critical advantage of this approach is that the perspective of patients is collected in a manner that is not influenced or biased by the researcher. A carefully crafted seeding statement is presented to patients, who work alone, to generate ideas in response to the statement. These ideas are presented, one by one in an egalitarian manner, and recorded with their meaning clarified by the group if necessary. All the ideas are placed on cards and sorted (grouped) individually by patients in any way they see fit. The sort data are entered into a computer, and multidimensional scaling and cluster analyses then produce a “cluster map,” which is a visual representation of the patients’ ideas and how these ideas are interrelated. The wording that patients use in the workshop is preserved and feeds straight into questionnaire items within a comprehensive set of well defined constructs.
The process of concept mapping was illustrated using the development of the Medication Education Impact Questionnaire (MeiQ)20 and the Methotrexate in Rheumatoid Arthritis Knowledge test (MiRAK)21. The MeiQ measures the impact of education about medications in meeting patient needs. It consists of 29 questions across 6 scales: information quality; active communication; coming to terms with diagnosis and treatment; self-management role and responsibility; capacity to self-manage; and self-management support. The MiRAK measures knowledge about methotrexate. The questionnaires have undergone rigorous testing with stakeholders, and have also undergone stringent calibration, validation, and test-retest reliability tests. These have shown that the questionnaires have good content and construct validity, internal consistency, and stability over time. Importantly, the concept mapping resulted in patient-centered questionnaires with well defined short scales, which are highly pertinent to patients.
M.A. van de Laar reported on the Q-sort methodology developed by William Stephenson. The “Q-method” helps to study people’s “subjectivity,” that is, their viewpoint. This contrasts with the traditional factor analysis or “R method,” which involves correlations between variables across a sample of subjects. The name “Q” comes from the form of factor analysis that is used to analyze the data. Q looks for correlations between subjects across a sample of variables22. Q factor analysis reduces the many individual viewpoints of the subjects down to a few “factors,” which represent shared ways of thinking. As an example a study was discussed on fatigue in RA. For this study we questioned: Is it possible to categorize patients according to their fatigue experience? Based on early interview studies as well as on existing measurement instruments for fatigue, a selection of 200 items was used. These 200 items were reduced to 57 relevant statements for RA fatigue. Patients evaluated these fatigue items on cards by sorting them according to a normal distribution. A 4-factor model categorized 2/3 of the patients. These factors can be described as: Little impact of fatigue; Good coping and bad sleep; High distress; Search for balance. This information will be used to create an IRT calibrated item bank to be used in a computer adaptive test instrument that is expected to be finalized at the end of 201123,24.
E. Dures reported on further studies of fatigue in RA, which is a major problem for patients, but measurement is a challenge. A systematic review by Hewlett, et al25 concluded that currently there is no valid tool to measure fatigue in UK patients with RA. Subsequently, 3 years of careful and rigorous development led to the Bristol RA Fatigue group of scales26,27. The BRAF Multi-Dimensional Questionnaire (BRAF-MDQ) is a 20-item instrument that contains 4 separate subscales (Living with fatigue, and physical, emotional, and cognitive fatigue). The BRAF short scales are 3 single questions on fatigue severity, coping, and effect, validated in visual analog and numerical rating scale versions (BRAF VAS and BRAF NRS). If these PRO are to reflect fatigue, then it is crucial that patients understand them in the way the research intended (and those intentions were based on qualitative data from patients). Qualitative data, focus groups, and “Think Aloud” cognitive interviewing were therefore employed to develop wording for the short scales and to review the draft BRAF-MDQ questions. This exposed some key areas of mismatch between researchers’ intent and patient interpretation, and led to rewording of some stem questions, some response items, and some layout of questions, which strengthened the confidence users can have in these scales as being easily understood by patients. Currently the 2 final stages of evaluating this new PRO are under way. Test-retest reliability raises challenges, because of the inherent fluctuating nature of RA fatigue. Sensitivity to change or responsiveness also raises issues about selecting an appropriate intervention and time-frame within which fatigue should improve.
OUTCOME OF DISCUSSIONS
Breakout discussions in groups of 8–15 were asked to consider their own experiences of working with instrument development, the practical difficulties in ensuring rigorous processes, and the potential usefulness (or otherwise) of OMERACT, specifying in more detail how these aspects may be incorporated into the “Truth” section of the OMERACT Filter28. The verbal reports back to the workshop from the breakout group reporters (together with later written reports), plenary session discussions, and subsequent informal discussions were drawn together, and a summary presented at the final OMERACT 10 plenary session. The main considerations are shown in Table 1 and an action program shown in Table 2. These consist of a mixture of existing information and possible new areas of research.
Main considerations from the discussion group reports.
Proposed agenda for action
There was a general appreciation that, while we should strive for rigor and quality, there should be the recognition that exact measures do not exist. This point emerged during discussion about tradeoffs between using current imperfect measures where they exist and the long and complex process of developing new instruments. It was also observed that new measurement approaches such as those proposed by the PROMIS group need to demonstrate that the theoretical benefits can indeed be realized in practice29. Although existing measures might not meet modern standards for questionnaire design30, particularly in the areas of content and construct validity for rheumatological conditions, they may remain useful in the absence of better measures, for example, in the area of sleep quality in RA, where there are good existing generic measures31; but a contrary example might be the need for development of a fatigue scale that adequately captures the experience of people with RA26,27. Thus 2 parallel pathways can be followed. First, start using acceptable instruments for the assessment of PRO but take account of their limitations. Second, continue to improve specific as well as generic instruments for the assessment of PRO.
Another issue raised in several discussion groups was the importance of the setting (which disease, which circumstances) and purpose for which a measure was developed and within which it was validated. It is not acceptable to make assumptions that an instrument developed with a specific population can be used generically.
Patients particularly felt that when they complete an instrument they should be informed why they are being asked to do so, and how the results will used. However, there is the possibility that provision of detailed information may bias data collection and sampling and may therefore compromise the overall mission of the research. This was an unexpected consideration, and it is recommended that researchers and patient partners work together to develop information that is both respectful and enhances the research outcomes for each study on a case by case basis.
Agenda for action and voting
The agenda for action included the potential for OMERACT to have available appropriate references to validated instruments, and contact points for experts in the field of instrument development and testing. There was a feeling that the OMERACT filter might be enhanced if there was advice about the development of instruments incorporated into the “Truth” section of the filter.
At the end of this process participants were asked to vote on 2 primary questions. First, should OMERACT collate a set of principles and procedures to support the OMERACT community to develop high quality questionnaires? Participants voted 93% in favor. Second, would it be helpful to have evidence-based guidelines (quality standards) for the development of instruments to measure patient-reported outcomes? Participants voted 85% in favor. The main points from the discussion group feedback and the proposed agenda for action were carried forward to a final plenary session, which considered the main issues from all 4 Patient Perspective workshops taken together2.