Abstract
Objective. Numerous patient-reported outcome measures (PROM) exist for the measurement of physical function for psoriatic arthritis (PsA), but only a few are validated comprehensively. The objective of this project was to prioritize PROM for measuring physical function for potential incorporation into a standardized outcome measurement set for PsA.
Methods. A working group of 13 members including 2 patient research partners was formed. PROM measuring physical function in PsA were identified through a systematic literature review and recommendations by the working group. The rationale for inclusion and exclusion from the original list of existing PROM was thoroughly discussed and 2 rounds of Delphi exercises were conducted to achieve consensus.
Results. Twelve PROM were reviewed and discussed. Six PROM were prioritized: Health Assessment Questionnaire (HAQ) and 4 modifications (HAQ-Disability Index, HAQ-Spondyloarthritis, modified HAQ, multidimensional HAQ), Medical Outcomes Study 36-item Short Form survey physical functioning domain, and the Patient-Reported Outcomes Measurement Information System (PROMIS) physical functioning module.
Conclusion. Through discussion and Delphi exercises, we achieved consensus to prioritize 6 physical function PROM for PsA. These 6 PROM will undergo further appraisal using the Outcome Measures in Rheumatology (OMERACT) Filter 2.1.
Psoriatic arthritis (PsA) is a chronic inflammatory disease with manifestations including arthritis, enthesitis, dactylitis, spondylitis, and skin and nail psoriasis1,2. PsA causes damage of articular joints and can profoundly affect physical function and health-related quality of life (HRQOL) in affected individuals. The Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) and Outcome Measures in Rheumatology (OMERACT) are working to combine perspectives of care providers, researchers, and patient research partners (PRP) to update the PsA core outcome set, which identifies the key outcomes to be measured in randomized controlled trials (RCT) and longitudinal observational studies (LOS)3.
Core outcome sets represent the minimum domains that should be measured and reported in all RCT and LOS of a specific condition4. Use of core outcome sets does not imply that outcomes in a particular RCT should be restricted to those endpoints. OMERACT advocates that each trial should measure the core outcome set, which is based on both a core domain set (the What to measure) and a core outcome measurement set (the How to measure)5. A core domain set for PsA was updated and endorsed in 20163.
The lack of standardization of outcome measurement instruments in PsA RCT and LOS has been highlighted, resulting in inconsistency of data reporting and heterogeneity in results6. After finalizing the core domain set, the GRAPPA-OMERACT PsA Core Outcome Set working group is currently leading the effort to develop and ratify a standardized core outcome measurement set7. The process follows recommendations outlined in the OMERACT Filter 2.15,8. The OMERACT Filter 2.1 is a set of standards for evidence-based decision making that addresses core outcome set development. Endorsing a measurement instrument to assess a certain domain using the OMERACT Filter 2.1 involves multiple work streams including systematic literature reviews (SLR), with appraisal and synthesis of the evidence on instrument properties; wide discussions, and Delphi consensus exercises. The synthesis of evidence follows the pillars of OMERACT Filter 2.1: domain match (i.e., instrument measuring what it is supposed to measure), feasibility (i.e., instrument is practical to use), truth (i.e., degree to which the instrument’s score makes numerical sense), and discrimination (i.e., instrument can distinguish situation of no change vs change, is sensitive to change in RCT, and has a threshold of meaning for interpretation)5.
Physical function is included in the PsA core domain set because it has been identified as one of the core domains reflecting disease effect in patients with PsA9,10,11. Several instruments are available to measure physical function in PsA, including those originally developed for use in other conditions, such as rheumatoid arthritis (RA), as well as newer instruments developed specifically for PsA12. The process of prioritizing which instruments to further appraise using the OMERACT Filter 2.1 is conducted by individual working groups. The PsA Core Outcome Set working group steering committee developed a template to facilitate this process, and this template has been described elsewhere13. It includes the formation of a working group, identification of instruments, and preliminary appraisal of existing evidence, and discussions and Delphi exercises to prioritize instruments that have the highest potential to fulfill OMERACT Filter 2.1. This report details the steps taken by the physical function working group to prioritize patient-reported outcome measures (PROM) for the assessment of the physical function domain in PsA that will be candidates for further consideration.
MATERIALS AND METHODS
This report describes application of a template to the physical function domain for PsA to prioritize instruments to undergo the OMERACT Filter 2.1. The discussion and surveys among researchers were deemed exempt from Institutional Review Board approval.
Formation of a working group for the outcome domain. The working group members were identified through GRAPPA and included personnel with expertise in the physical function domain in PsA. Candidates were invited from within the steering committee and recommendation from working group members. The working group involved at least 2 PRP who were invited to participate by the GRAPPA PRP chair.
Identification and preliminary appraisal of measurement instruments for the domain. Physical function in PsA was defined as “being able to perform physical activities from daily to recreational activities (includes upper/lower extremity functioning, balance)”14. Examples of the concept of physical function were taken from quotations from a GRAPPA international focus group study9 and summarized in a workbook compiled for working group members (Supplementary Data, available from the authors on request). Based on this definition and the concept of physical function being the perception of physical capability, the working group therefore decided to focus on PROM instead of performance-based assessments.
We identified outcome measurement instruments for measuring physical function based on results from a recent systematic review of measurement properties of PROM in PsA that involved both health professionals and PRP15. In the previous work, published articles with data regarding development or assessment of the measurement properties of PROM were identified15; these measurement properties were evaluated using the approach described by Prinsen, et al16 and the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) checklist17. The full process and results are described elsewhere15. Each PROM was appraised for 3 main categories and 8 subcategories, namely reliability (subcategories: internal consistency, test-retest reliability, measurement error), validity (subcategories: content validity, structural validity, hypothesis testing, cross cultural validity, criterion validity), and responsiveness17.
In addition, new and potential instruments that measure physical function were suggested by working group members.
Discussion and Delphi exercise to achieve consensus regarding instrument prioritization. A teleconference was conducted among working group members to discuss the various PROM and the Delphi formats. The working group decided to have 2 rounds of Delphi exercises, with interim discussions by teleconference or e-mail to facilitate achieving consensus on prioritizing physical function PROM. All Delphi exercises were conducted anonymously on online portals.
A comprehensive workbook on the physical function PROM was developed and presented to working group members (Supplementary Data, available from the authors on request). This included the background, format, and scoring methods for each PROM. Included in the workbook was a Summary of Measurement Properties (SOMP) table that detailed the measurement properties of the PROM appraised in the previous work15. However, information presented in the SOMP table was considered secondary, because the full set of evidence required by OMERACT Filter 2.1 had not been developed. In particular, RCT evidence for discrimination was not included.
In the first Delphi exercise, working group members were asked to vote based on their own understanding of the PROM. Working group members were advised to focus primarily on whether the PROM matched to the domain of physical function in PsA and on the feasibility of the PROM. A question for each PROM was asked, “Do you think this PROM should be taken forward for further evaluation?” A simple yes/no response for each PROM was requested, and additional comments were collected as free text.
The results of the voting of the first Delphi exercise were discussed. The working group then drafted the questions for a second Delphi exercise. All 13 working group members were invited to participate in the second Delphi exercise. Again, working group members were asked whether to take the individual PROM to appraisal by means of OMERACT Filter 2.1, based on their understanding of domain match, feasibility, and measurement properties. It was prespecified that instruments receiving < 70% endorsement in the second Delphi exercise would be excluded from further formal appraisal using OMERACT Filter 2.1.
RESULTS
Formation of the physical function working group. A physical function working group of 13 members was formed in June 2018. The members of the working group consisted of experts (10 rheumatologists and 1 methodologist) with experience in physical function measurement in PsA, and 2 PRP. Working group members had international representation, spanning 4 continents (countries of origin: Australia, Canada, Denmark, Hong Kong SAR of China, Singapore, United Kingdom, and United States). Two teleconference sessions with PRP were conducted to explain the purpose of the study, workflow, instruments for consideration of assessment of physical function domain, and the OMERACT Filter 2.1 methodology.
Identification of PROM for physical function. The evidence derived from the SLR for 10 physical function PROM was extracted from the published article15 and presented to working group members for review and discussion (Supplementary Data, available from the authors on request). These were the PROM: Health Assessment Questionnaire-Disability Index (HAQ-DI)18, HAQ-Spondyloarthritis (HAQ-S)19, modified HAQ (mHAQ)20, physical functioning domain of the Medical Outcomes Study 36-item Short Form survey (SF-36 PF10)21, physical component summary score of the SF-36 (SF-36 PCS)21, PCS of the SF-12 (SF-12 PCS)22, Psoriatic Arthritis Impact of Disease (PsAID) functional capacity item23, Arthritis Impact Measurement Scales (AIMS)24, Bath Ankylosing Spondylitis Functional Index (BASFI)25, and the American College of Rheumatology (ACR) functional class in RA26. Two additional PROM were suggested by working group members: multidimensional HAQ (MDHAQ)27 and the Patient-Reported Outcomes Measurement Information System (PROMIS)-Short Form Physical Function 10a (PROMIS-PF10a)28. The MDHAQ has been incorporated in the Routine Assessment of Patient Index Data 3 (RAPID-3) that was developed for use in clinical care in RA29, and is being incorporated as a routine measurement in clinical care for PsA in some countries. The PROMIS-PF10a was developed based on item banks for physical function.
Relevant information for these 12 physical function PROM was summarized in a comprehensive workbook and circulated to all working group members (Supplementary Data, available from the authors on request) for review in preparation for discussion and the first Delphi exercise.
Working group discussions and Delphi exercises. The first Delphi exercise was conducted in June 2018 and finalized on 12 July 2018 through an anonymized online voting portal. All 13 working group members participated (response rate 100%). The voting results of the first Delphi exercise and comments made regarding various PROM are summarized in Table 1.
The results of the first Delphi exercise were then presented to the working group members, followed by open discussion by e-mail from July 12 to 27, 2018. A 1-hour Web-based discussion was conducted on August 23, 2018, followed by further open discussion by e-mail from August 23 to September 19, 2018. During the teleconferences and subsequent e-mail communications, members of the working group spoke freely on their views of the PROM. Based on the discussion points, a script for a second Delphi exercise was drafted and reviewed by all working group members. Several phrasing revisions were made and finally agreed upon by all members of the working group (Table 2).
For the second Delphi exercise, results of the overall voting of the working group in the first Delphi exercise and discussion points were made available. All 13 working group members participated in the second Delphi exercise and results with reasons for the inclusion or exclusion of all PROM are summarized in Table 2.
The HAQ and modifications. The HAQ-DI was originally developed for RA and adapted for arthritis in general18. It includes 20 items assessing 8 aspects of physical function: dressing and grooming, arising, eating, walking, hygiene, reach, grip, and activities. As the most commonly used instrument to assess physical function in PsA RCT12, it received unanimous endorsement in both Delphi exercises.
The HAQ-S, a modification of the original HAQ-DI with 5 additional items assessing function of the axial spine, received only a 69% vote in the first Delphi. While analyses of data have demonstrated that the HAQ-S does not record additional information compared with the HAQ-DI30, some members thought that this result may have been related to the original PsA cohort in which the HAQ-S was tested and needed further testing in populations enriched for the presence of axial PsA. Both the HAQ-DI and HAQ-S have been collected in the large Corrona registry in the United States; thus comparative data about performance of the 2 instruments in patients whose PsA includes axial involvement may potentially become available from the registry. In the second Delphi exercise, use of the HAQ-S was addressed with 2 questions: the first was whether to include, and the second was to allow use of either the HAQ-DI or HAQ-S depending on the clinical setting. With these considerations, the HAQ-S received 76.9%, and the HAQ-DI and HAQ-S as a family received 84.6% of the votes in favor of inclusion.
The mHAQ is a shortened version of HAQ-DI with only 8 items, 1 from each subdomain of the HAQ-DI20; it received > 70% of the votes for inclusion in both Delphi exercises. The MDHAQ, which includes the 8 items of mHAQ with 2 additional items (patient global assessment of disease activity and pain)27, was presented as part of the RAPID-3 in the first Delphi when it received only 69% of the votes. During the teleconference discussion, the 10-item MDHAQ was clarified as an instrument purely to assess physical function. Consensus was achieved to retain the MDHAQ in the second Delphi exercise, with a vote of 76.9% to be included.
The Medical Outcomes Study surveys. The SF-36 PCS received 61.5% of the vote in the first Delphi. Although the results of SF-36 PCS scores have been reported in many RCT, there were concerns expressed by the working group regarding the concept represented by the summary scores of the SF-36, because they are calculated based on positive and negative weighting of all 8 domains with a population norm of a mean (SD) of 5010. The key utility of this norm-based scoring is for easy comparison of the summary scores at a group level with the normal population average scores in epidemiologic studies31. However, the SF-36 PCS represents a broader concept than physical function alone21,31, and therefore does not have domain match. The SF-36 PCS was excluded following the second Delphi exercise. In contrast, the SF-36 PF10 that includes 10 items measuring physical function did not have domain match. The SF-36 PF10 received unanimous endorsement for inclusion from both Delphi exercises. It has been noted, however, that to use the SF-36 PF10, the entire SF-36 questionnaire must be scored21,31.
Based on the same reasoning by which the SF-36 PCS was excluded, the working group felt the SF-12 PCS did not represent the physical function domain (lack of domain match). Also, there is no existing study that has evaluated its exclusive use in PsA. The SF-12 PCS was excluded from the second round of the Delphi exercise and further consideration.
The PsAID functional capacity item. PsAID is a PsA-specific derived multidimensional instrument that measures the life effect of PsA. It is often considered an HRQOL measure23. Physical function is represented by a single item with an 11-point numeric rating scale (0–10) as functional capacity effect attributed to PsA. It received 84.6% of the votes in the first Delphi. Concerns were raised regarding the validity of using a single item from a composite measure of HRQOL, and the domain match of the item itself. Consensus excluded the PsAID functional capacity item from further evaluation in the second Delphi exercise.
PROMIS-PF10a. Despite the lack of validation data, the working group thought that the PROMIS-PF10a could be a promising instrument. The PROMIS-PF10a was developed from a 1728-item bank taken from 165 instruments assessing physical function. There is some data to support construct validity in RA32, but no data exist for PsA. It received 92.3% and 100% of the votes for inclusion in the first and second Delphi exercises, respectively.
Other PROM. The AIMS, BASFI, and ACR functional class received 30.8%, 61.5%, and 30.8% votes in the first Delphi exercise, respectively. Shortcomings for the AIMS include its length, thereby lacking feasibility; and its lack of use in the last decade. The BASFI was considered not to have adequate domain match as well as not providing additional information beyond the HAQ-DI. The ACR functional class was considered to be lacking domain match because it is too crude an instrument for measuring physical function in patients with PsA who currently are less physically impaired or have less disability following the new treatment strategies33. These 3 instruments were considered as a single question in the second Delphi exercise and were excluded from further appraisal using the OMERACT Filter 2.1.
DISCUSSION
In this report we summarize the process leading to a preliminary prioritization of PROM for assessment of physical function in PsA RCT and LOS. Six PROM for assessment of the physical function domain in PsA were successfully prioritized for further appraisal: HAQ-DI, HAQ-S, mHAQ, MDHAQ, SF-36 PF10, and PROMIS-PF10a. These prioritized PROM will undergo formal appraisal of specific measurement properties using the OMERACT Filter 2.1 individually.
Members of GRAPPA are committed to standardizing the core outcome measurement set for PsA RCT and LOS; standardization is essential to minimize heterogeneity and facilitate interpretation of the studies7. With updating of the PsA core outcome set, research processes have been under way to evaluate instruments for each of the specified domains. We tested a consensus-based process for candidate instrument triage and showed its feasibility to prioritize instruments for the physical function domain. This process as illustrated in Figure 1 was drafted following a consensus effort from the steering committee including input from PRP and may be used as a template in guiding subsequent working groups to choose instruments with high potential for fulfilling the OMERACT Filter 2.1 for instrument selection. Its application may be especially useful when assessing domains that have numerous existing measurement instruments developed over the years, often for other indications, particularly for domains such as physical function and HRQOL. This template may be less useful for highly specific PsA domains such as enthesitis, where few instruments are specifically developed and available, so that the working group may not need a method to shortlist instruments.
The work processes in this template (Figure 1) consisted of forming a working group, identification of PROM through SLR, thorough discussions on content and feasibility of the instruments, and achievement of consensus through Delphi exercises. This template provided a platform for the working group to exclude instruments that have inadequate domain match, poor feasibility, or otherwise low potential from further formal appraisal using the OMERACT Filter 2.1. It also allowed new instruments that have less established evidence but high potential to be considered for further evaluation. As an example, the PROMIS-PF10a has not been used in PsA and further appraisal of evidence using the OMERACT Filter 2.1 would be impossible. With its prioritization, the working group is committed to deriving supportive data for it and may consider other versions of PROMIS-PF. Inadvertently, subsequent appraisal of instruments could be biased toward better-established instruments that have been used in PsA. Prioritizing instruments at an earlier stage will therefore prompt the working group to recognize the gaps and derive new data to support or refute the newer instruments. Even for the more established instruments, we also recognized that there may be limited evidence to support some measurement properties. New evidence will need to be further developed, which will be part of the processes of the OMERACT Filter 2.1.
The strengths of this current report include collaborative work from healthcare professionals and PRP to prioritize instruments for further appraisal. The working group members have expertise in the physical function domain in PsA with international representation. There are some limitations in interpretation that require highlighting. The consensus Delphi exercises were conducted among the 13 working group members rather than involving a larger number of people, recognizing that the discussions were deep and thorough. During the Delphi exercises, working group members voted for the PROM based on their overall impression of the PROM. These gaps will be bridged eventually because each of the prioritized PROM will be taken forward to formal appraisal using the OMERACT Filter 2.1. Evidence supporting each PROM in the final standardized outcome measurement set will be presented instrument by instrument, and endorsement from a larger GRAPPA and OMERACT community will be sought.
We report application of a consensus-driven template to prioritize multiple instruments for further appraisal for the physical function domain in PsA, in a project to standardize the core outcome set in PsA. We prioritized 6 PROM for use in RCT and LOS through a concerted effort from experts and PRP. These prioritized physical function PROM will undergo further appraisal using the OMERACT Filter 2.1.
ACKNOWLEDGMENT
We thank Dorcas Beaton and Lara L. Maxwell for technical advice and review of this manuscript.
Footnotes
YYL is funded by the Clinician Scientist award of the National Medical Research Council, Singapore (NMRC/CSA-INV/0022/2017). The views expressed are those of the author(s) and not necessarily those of the NMRC. AMO is funded by the Jerome L. Greene Foundation Scholar Award, the Staurulakis Family Discovery Award, the Rheumatology Research Foundation, and the US National Institutes of Health (NIH) through the Rheumatic Diseases Resource-based Core Center (P30-AR053503 Cores A and D, and P30-AR070254, Cores A and B). All statements in this report, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the NIH, the US National Institute of Arthritis Musculoskeletal and Skin Diseases (NIAMS), or the Rheumatology Research Foundation. PH and RC (The Parker Institute, Bispebjerg and Frederiksberg Hospital) are supported by a core grant from the Oak Foundation (OCAY-18-774-OFIL). LCC is funded by a UK National Institute for Health Research (NIHR) Clinician Scientist award. The research was supported by the NIHR Oxford Biomedical Research Centre (BRC). The views expressed are those of the author(s) and not necessarily those of the UK National Health Service, the NIHR, or the UK Department of Health. AO is funded by the Rheumatology Research Foundation and NIH/NIAMS R01 AR072363.
- Accepted for publication January 17, 2020.