Abstract
Objective. The Outcome Measures in Rheumatology (OMERACT) Worker Productivity Group continues efforts to assess psychometric properties of measures of presenteeism.
Methods. Psychometric properties of single-item and dual answer multiitem scales were assessed, as well as methods to evaluate thresholds of meaning.
Results. Test-retest reliability and construct validity of single item global measures was moderate to good. The value of measuring both degree of difficulty and amount of time with difficulty in multiitems questionnaires was confirmed. Thresholds of meaning vary depending on methods and external anchors applied.
Conclusion. We have advanced our understanding of the performance of presenteeism measures and have developed approaches to describing thresholds of meaning.
- OMERACT
- PSYCHOMETRIC PROPERTIES
- MINIMUM IMPORTANT DIFFERENCE
- DUAL PRESENTEEISM SCALE
- PATIENT ACCEPTABLE STATE
- PRESENTEEISM
Quantifying restrictions in worker participation, including absenteeism, sick leave, and presenteeism (i.e., reduced productivity because of ill health), is an important outcome from a patient’s perspective and is increasingly seen as a health outcome to target for improvement. People with rheumatic and musculoskeletal diseases (RMD) can experience variable levels of presenteeism and absenteeism depending on their health status, job demands, or other personal or environmental contextual factors1.
During the last 8 years, the Outcome Measures in Rheumatology (OMERACT) Worker Productivity Group has evaluated available measures to assess worker productivity loss, initiated new research to fill in knowledge gaps regarding psychometric properties, and appraised these measures against the OMERACT Filter 2.11,2,3,4. Based on a review of available instruments in the literature, we had a mandate to move forward with 6 candidate measures (4 single-item global and 2 multiitem measures) to assess presenteeism2: Worker Productivity Scale-Arthritis (WPS-A)5, Work Productivity and Activity Impairment Questionnaire (WPAI)6, Work Ability Index (WAI)7, Quality and Quantity (QQ) questionnaire8, Workplace Activity Limitations Scale (WALS)9, and the modified Work Limitation Questionnaire-25 (WLQ-25PDmod)10. These can be organized into a taxonomy of 4 different types of worker productivity measures, which sit against the background of contextual factors (Figure 1). At OMERACT 12, we received support (> 70% consensus) that WLQ-25PDmod, WALS, and WAI had enough OMERACT Filter evidence available and we are conducting ongoing research for these measures for future endorsement, while also continuing to monitor QQ. Since OMERACT 12, we have progressed in our research across the following 4 workstreams: (1) collating further evidence about reliability, content/construct validity of global (i.e., single-item) measures of presenteeism and supplementing information on WPS-rheumatoid arthritis (RA) and WPAI, which were previously endorsed; (2) evaluation of psychometric of dual answer scales of 2 validated multiitem measures; and (3) determination of patient acceptable state (PAS) and the minimal important difference (MID) of presenteeism measures; and (4) contextual factors.
MATERIALS AND METHODS
Special Interest Group (SIG) OMERACT 2018
At OMERACT 14, we presented an update of our work on the first 3 workstreams. Attendees at our SIG included patients (n = 4), clinicians (n = 7), 1 fellow, and others (e.g., methodologists, industry, n = 5). Important questions were discussed with participants during breakout sessions, including:
Global measures: Based on the results presented (reliability, cross-cultural differences, construct validity), what would be your preferred global measure and why?
Multiitem measure: Based on the context of your research, or your experience as a patient, what do you think are the advantages and drawbacks of using answers that assess both the degree of difficulty and the amount of time with difficulty?
PAS/MID: (1) How best to manage MID thresholds and (2) do you agree with the need to report multiple MID/PAS thresholds?
Ethics approval was obtained for individual studies and all patients provided written informed consent [Making It Work trial: University of British Columbia Research Ethics Board (H11-03527); the EULAR-PRO (European League Against Rheumatism — Patient Reported Outcomes) study obtained overall ethical approval from National Research Ethics Service Committee NW–Greater Manchester (12/NW/0172), and from each participating center according to national guidelines].
Global measures
To address the meaning of at-work productivity loss measures from a patient’s perspective in different cultures, we conducted the international EULAR-PRO study to assess presenteeism in patients with inflammatory arthritis (IA) or osteoarthritis. The results of phase I have been published and show fair to excellent test-retest reliability [ICC for Health Productivity Questionnaire (HPQ; question C; 0.59) to WPS-RA (0.78)]11. In-depth cognitive debriefing interviews revealed variation in how participants interpreted some of the constructs among the 5 measures, especially regarding “performance” in the HPQ scale, which was a term used in sport and theatre but not related to work for participants from Romania and Sweden12. For most participants (∼70%), a recall period of 7 days up until a month would be a good reflection of the effect their health has on work. Phase II is an international observational cohort study (n = 8 countries) to further test psychometric properties. Preliminary results of baseline data on construct validity were presented during the SIG and show moderate to good construct validity (Table 1)13. During the break-out session, SIG attendees agreed that a recall period of 1 day was not representative, although they thought a recall period of a month might be too long. Other discussion points included wording of anchors (e.g., normal). Further, participants highlighted the difficulty in answering and interpreting disease-specific scales, because of the complexity of many rheumatic diseases, and preferred a generic scale.
Multiitem measures
How to best measure presenteeism using multiitem scales remains challenging. The WLQ and WALS are frequently used, but participants’ feedback expressed concern about the constructs measured by each instrument. The WLQ measures the amount of time people are limited, but not the extent to which they are limited. This was perceived as a drawback by patients who felt it misses an important part of their experience and by researchers interested in evaluating presenteeism as a health state. In contrast, the WALS measures the extent of limitation but not time. This was a drawback to health economists, because of difficulty assigning cost. To evaluate psychometric properties encompassing both concepts, items from each measure (WALS and WLQ) were offered both time and difficulty response keys (dual answer keys).
Baseline and 6-month data were used from a Canadian randomized controlled trial (Making It Work Program) of an employment intervention including patients with IA (n = 364)14,15. The psychometric properties of the measures were first evaluated with the 2 answer keys analyzed separately (i.e., without combining results)16. Answers from the dual answer keys were then combined into a single score, obtained by (1) multiplying or (2) adding scores of difficulty and time answer keys at the item level17. No significant differences were observed between additive and multiplicative models. High correlation (≥ 0.8) between difficulty and time was found in only 2/12 WALS items and 11/25 WLQ, justifying the need for dual answer keys. High internal consistency (i.e., ≥ 0.7) was found for WALS and all WLQ subscales for both answer keys analyzed separately and combined (except WLQ-Physical Demands)16. As a priori hypothesized, moderate correlations were observed between original answer keys, or combined scores, of WLQ subscales and WALS with measures assessing similar concepts [WPAI; work instability scale (WIS; congruent validity)]. During the SIG, all agreed that dual answer keys provided additional value. Patient representatives uniformly said they felt that asking both degree and time with difficulty better reflected their experience, and that asking time alone would miss an important concept. The main concern raised was the length and complexity of the questionnaire with both answer keys. Other issues raised included concern about the 2-week recall period and descriptors for time options (considered difficult to answer by patients), and concern about percentage of time attributed to descriptor (e.g., some of the time = 50% of the time).
Thresholds of meaning for worker productivity measures
Thresholds of meaning are benchmarks for scores (e.g., PAS of pain) or change in scores [e.g., minimal threshold for change to be important (MCID)] that aid in the interpretation at an individual patient level. Recently, Copay, et al has demonstrated that there are considerable differences in MCID thresholds depending on the anchor or method18. At OMERACT in 2018, our focus was on dealing with these differences. As a group, we had reviewed the literature on these attributes and decided on best methods for their determination. In doing so, we emphasized the pivotal role of a meaningful anchor that becomes a gold standard for threshold determination, and the methods used to determine the actual cutoff. We fielded several anchors and provided several analytic approaches to each, allowing us to see the differences in values obtained, which also led to differences in the proportion deemed to be “improved” or “in an acceptable state” (Figure 2).
During our SIG, most of the attendees agreed that we will need to work with a range of MID values. There are also new developments and approaches in reporting results for thresholds, such as cumulative distribution function19. The various thresholds for MCID are highlighted with a vertical line on the same graph and demonstrate not only the proportion responding, but whether various MCID values would lead to different interpretations of the relative gains. Another approach discussed was the cumulative proportion responders analysis graph20, which plots proportion responders (as defined by having exceeded the MCID) against magnitude of change with 1 line for each arm in a trial. For clinical trials, this allows more transparent interpretation of the difference between arms. In MCID development work, a plot for each MCID value in a cohort would allow us to see whether different MCID thresholds had a large or small difference in the proportion classified as improved. The breakout groups agreed that these reporting approaches could improve the management of multiple MCID values. They will be forwarded to the Technical Advisory Group of OMERACT for consideration.
Key points resulting from the SIG:
A dual scale, measuring both time and difficulty, better identifies patient’s experience, but the main drawback is the length and complexity of such a scale.
There is no perfect global scale, but a generic scale with a recall > 1 day and < 1 month is preferred.
Development of reporting approaches is key to improve management of multiple MCID values.
DISCUSSION
We have continued to gather the evidence needed to recommend the right worker productivity outcome measures to be included in clinical studies. Moving toward OMERACT 17:
We are updating our literature against Filter 2.1 and will be finalizing our analysis of global scales for voting at OMERACT 17.
We will further evaluate the value of the dual scale and test in other trials with an aim to recommend a better scale identifying both difficulty and time having difficulties.
We will provide recommendations for PAS/MCID to be applied in worker productivity studies and to inform future MID/PAS research in other areas.
We will further our understanding of contextual factors in relation to worker productivity loss and our work will inform the OMERACT contextual factor group.
Acknowledgment
We acknowledge representatives from BMS, AbbVie, UCB, and Pfizer for their collaboration with the OMERACT worker productivity group. We also acknowledge all researchers involved in the EULAR-PRO at-work productivity group for their contribution to the global measure studies.
- Accepted for publication March 22, 2019.