Abstract
Objective. Several global measures to assess at-work productivity loss or presenteeism in patients with rheumatic diseases have been proposed, but the comparative validity is hampered by the lack of data on test-retest reliability and comparative concurrent and construct validity. Our objective was to test-retest 5 global measures of presenteeism and to compare the association between these scales and health-related well-being.
Methods. Sixty-five participants with inflammatory arthritis or osteoarthritis in paid employment were recruited from 7 countries (UK, Canada, Netherlands, France, Sweden, Romania, and Italy). At baseline and 2 weeks later, 5 global measures of presenteeism were evaluated: the Work Productivity Scale–Rheumatoid Arthritis (WPS-RA), Work Productivity and Activity Impairment Questionnaire (WPAI), Work Ability Index (WAI), Quality and Quantity questionnaire (QQ), and the WHO Health and Performance Questionnaire (HPQ). Agreement between the 2 timepoints was assessed using single-measure intraclass correlations (ICC) and correlated between each other and with visual analog scale general well-being scores at followup by Spearman correlation.
Results. ICC between measures ranged from fair (HPQ 0.59) to excellent (WPS-RA 0.78). Spearman correlations between measures were moderate (Qquality vs WAI, r = 0.51) to strong (WPS-RA vs WPAI, r = 0.88). Correlations between measures and general well-being were low to moderate, ranging from −0.44 ≤ r ≤ 0.66.
Conclusion. Test-retest results of 4 out of 5 global measures were good, and the correlations between these were moderate. The latter probably reflect differences in the concepts, recall periods, and references used in the measures, which implies that some measures are probably not interchangeable.
- AT-WORK PRODUCTIVITY LOSS
- PRESENTEEISM
- GLOBAL MEASURES
- TEST-RETEST
- CORRELATIONS
- RHEUMATIC DISEASES
Inflammatory arthritis (IA) and osteoarthritis (OA) are chronic diseases known to affect many aspects of an individual’s life including social, psychological, financial, and occupational consequences1,2,3. Many workers with rheumatic disorders such as rheumatoid arthritis (RA), psoriatic arthritis (PsA), ankylosing spondylitis (AS), and OA may experience restrictions participating in the workforce, potentially leading to sick days from work and early work cessation4. Rates of work disability in rheumatic patients are well documented, varying from 16% to 39% in patients with PsA in association with longer disease duration5; 67% after an average disease duration of 15 years in patients with RA6; and somewhat lower rates in patients with AS varying from 3% to 50% after 18 and 45 years’ disease duration, respectively7. Work disability literature for OA is limited, although it suggests much lower rates compared to inflammatory arthritis8. However, with advances in disease management for rheumatic diseases in the last decade, individuals with rheumatic diseases may be able to remain in employment9. This can be related to both earlier diagnosis and earlier and more effective pharmacologic treatment, suggesting that measures of productivity could be included as routine outcome measures in clinical trials and in observational studies10,11,12,13. Nonetheless, for those who remain in employment, sickness absence and at-work productivity loss may be experienced. At-work productivity loss, also known as presenteeism, comprises the level of difficulty experienced at work due to health conditions and the amount of productivity lost to such difficulties. Measuring presenteeism is relevant for patients as it reflects the difficulties and challenges patients experience in order to function at their work. A change in worker productivity for an individual can jeopardize the outcomes considered important for most workers, such as self-esteem and family esteem, social inclusion and participation, and material standards of living2. The importance of presenteeism as part of work-related outcomes is becoming increasingly recognized as a disease-related outcome, and subsequently, attention is shifting to presenteeism as a potential predictor of work disability, offering the opportunity to identify and support people at risk for becoming work-disabled. Thus, presenteeism measurements have become an integral part of identifying the overall effect of rheumatic disorders on work and also on worker productivity both in trials and in clinical followup.
Different measures are available to evaluate presenteeism, including global measures and multi–item measures14. Global measures record a general perception of presenteeism using 1 or 2 single items, while multi-item measures use a number of questions, with a focus on one or multiple aspects of presenteeism [e.g., time management, physical demands, mental interpersonal and output demands — The Work Limitations Questionnaire (WLQ-2515)]. Global measures are particularly attractive as they are more feasible in large clinical studies with interests in many outcome domains. Despite increasing research on the clinimetric properties of these global measures for at-work productivity, several questions remain14. First, no data on test-retest reliability are available for the measures in an arthritis population, and comparative construct and concurrent validity is scarce16,17. Second, data on comparative validity are limited but needed when aiming to recommend a global measure for use in a rheumatic working population. Finally, whether the global measures are applicable to differing occupational classes is an important but unanswered question.
Our aim was to investigate test-rest reliability of 5 global measures in patients with IA and OA including the Work Productivity Scale–Rheumatoid Arthritis (WPS-RA), Work Productivity and Activity Impairment Questionnaire (WPAI), Work Ability Index (WAI), Quality and Quantity questionnaire (QQ), and the WHO Health and Performance Questionnaire (HPQ); and to compare the association between these measures with each other and with general well-being to determine whether the data between the scales are interchangeable. The 5 global measures at the center of investigation were chosen after a survey among researchers, patients, and clinicians with expertise in the field of worker productivity.
MATERIALS AND METHODS
Study population
Individuals with a physician diagnosis of IA (including RA, PsA, and AS) or OA were recruited for study via 1 outpatient rheumatology or orthopedic clinic from 7 participating countries (United Kingdom, Canada, The Netherlands, France, Sweden, Romania, and Italy). Patients had to be age ≥ 18 years, in paid full or part-time employment, and able to communicate verbally and in writing in the language of the participating country. The reliability data collected in this study were part of a larger study that applied cognitive debriefing to investigate the content validity of the measures, the results of which will be presented in a different report. Recruitment aimed at 10 participants per country, with an equal distribution of age, sex, manual/nonmanual jobs, and disease type. All patients provided written informed consent. Ethical approval was obtained from each participating center according to national ethical guidelines.
Data collection
The following 5 global measures were selected after a survey among researchers, rheumatologists, and patients14: WPAI18, WPS-RA19, WAI20, QQ21, and the HPQ22. All 5 global measures in this study use an 11-point visual analog scale (VAS) for scoring presenteeism. Characteristics of the global measures can be found in Table 1. The WPAI, WPS-RA, WAI, and HPQ presenteeism items were extracted from the large absenteeism and presenteeism questionnaires with permission from the developers. The HPQ can be summed to provide 2 scores: a ratio of presenteeism in comparison to co-workers, or a standalone single-item measure23. For this study, the single-item measure was used (question C of the 3 HPQ questions presented to study patients).
If available, translated global measures were used in each country, otherwise the measures were translated into the corresponding language of the participating country by a company specializing in translations in medicine (PharmaQuest Ltd.). Translations were conducted using cognitive debriefing interviews with 5 patient volunteers with rheumatic diseases for each language of interest24, and utilized a forward and backward translation methodology.
Participants completed the set of global measures at 2 timepoints: at baseline and 2 weeks thereafter. An interval of 2 weeks has been recommended for use in test-retest reliability analyses25, and we believed 2 weeks would reflect little or no change in work productivity. Participants were sent hard copies of the baseline questionnaires by post 2 weeks before they were scheduled to attend the hospital/research center for the cognitive interview study. Participants completed the baseline questionnaires at home, which were then returned to each participating center in a prepaid envelope. The 2-week questionnaire along with reminders were sent to participants to ensure the timepoints were separated by a 2-week interval. Participants were instructed to complete the 2-week questionnaires at home prior to the interview. The global measures were presented in a booklet to participants, with instructions for each of the individual measures. Before the start of the cognitive interview, demographic (age, sex, highest level of education), clinical (diagnosis, symptom duration, date of diagnosis), and occupational (job description, demands, incentives for working) information was collected. Occupations were categorized into manual and nonmanual occupations according to the UK Standard Occupational Classification26. Finally, patients were asked about their general well-being using a VAS ranging from 0 (very well) to 100 (very poorly).
Statistical analysis
Descriptive statistics were used to summarize subjects’ demographic, clinical, and occupational characteristics. For the total study population, test-retest reliability data for each of the 5 global measures at baseline and 2-week followup were generated using single-measure intraclass correlations (ICC), applying a 2-way random-effects model based on a repeated-measure ANOVA (ICC 2,1)27. Missing values were accounted for automatically in the ICC analysis using listwise deletion. ICC values are in general interpreted as: > 0.75 = excellent for group level analyses, 0.40–0.75 = fair to good, and < 0.40 = poor28. Bland-Altman plots were created to identify error across the range of score scales for each of the 5 measures. Bland-Altman plots were considered appropriate as they allow systematic differences over the range of possible values to be detected that would otherwise be difficult to detect using simple point clouds. In a sensitivity analysis, ICC were computed to determine the reliability of the measures within the groups classified as having manual and nonmanual jobs. Spearman correlations were also used to compare the level of correlation among global measures of presenteeism, and between the measures with VAS general well-being. SPSS 20 was used to generate the ICC and Spearman correlations; STATA 13 was used for descriptive analysis and to create the Bland and Altman plots.
RESULTS
Although 70 (n = 10 per country) were recruited for study, data for 5 participants were excluded from analysis because of likely misinterpretation of the direction of the scales, with the WPAI and WPS-RA ranging from good to worse and other measures ranging from worse to good. This misinterpretation became apparent during the cognitive interviews and was confirmed using scatterplots between the different global measures. Further, fewer observations were recorded for the QQ (n = 62) and the HPQ (n = 64). The total study population (useable data) therefore comprised 65 patients with RA, PsA, AS, OA, or other, e.g., connective tissue disorder. The median age was 45 years [interquartile range (IQR) 37–52], 55% were female, and the median disease duration was 13 years (IQR 6–19). The majority of participants had RA (32%) (Table 2), and 71% had a nonmanual occupation. In general, participants had mild to moderate impact of disease based on the general well-being scale. The median score at baseline was 3 for the WPAI (IQR 1–6; higher score = worse score), 3 for the WPS-RA (IQR 2–6; higher score = worse score), 7 for the WAI (IQR 5–9; higher score = best score), 72 for the total QQ (IQR 49–100; higher score = best score; QQ questions 1 and 2 multiplied to give a range of 0–100), and a median of 7 for HPQ question C (IQR 6–8; higher score = best score).
Test-retest reliability
ICC for all 5 global measures at first and second assessment are shown in Table 3. Overall, the ICC in the total sample ranged from 0.59 (95% CI 0.40–0.73; HPQ question C) to 0.78 (95% CI 0.65–0.86; WPS-RA), which indicates fair to excellent agreement between test-retest scores at baseline and 2-week followup at group level (p < 0.001 for all tests). Bland-Altman plots illustrate the differences between the 2 timepoints (baseline and 2 weeks) and the mean scores at baseline and 2 weeks, demonstrating the degree of variation for all 5 global measures for the total sample (See Supplementary Figure 1, available online at jrheum.org). The 95% limits of agreement vary between each of the measures, but in general, the variation of the difference around the mean for most was moderate. For each measure, a few differences outside or close to the limit lines are present, demonstrating minimal levels of disagreement between the 2 timepoints (See panels, Supplementary Figure 1, available online at jrheum.org).
ICC in both the nonmanual working group (n = 46) and the manual working group (n = 19) showed good to excellent agreement, with ICC ranging from 0.67 (95% CI 0.47–0.80; HPQ question C) to 0.83 (95% CI 0.71–0.90; Qquality) in the nonmanual group of participants, and poor to excellent agreement rate, with a range of 0.39 (95% CI −0.08 to 0.70; HPQ question C) to 0.76 (95% CI 0.47–0.90; WPS-RA) in the manual group (Table 4).
Construct validity
Spearman rank correlations between the 5 global measures yielded values ranging from strong (WPAI and WPS-RA, r = 0.88) to moderate (Qquality vs WAI, r = 0.51). Assuming that these measures should assess the same construct, namely presenteeism, then ICC are justified. These ranged from fair [−0.46 (95% CI −0.64 to −0.25) for WPAI and Qquantity; negative values as a result of opposing anchors on scales] to excellent [0.85 (95% CI 0.76 to 0.90) for WPAI and WPS-RA]. Correlations between each of the individual measures and VAS general well-being were low to moderate, ranging from r = −0.44 (QQ total) to r = −0.66 (WPS-RA) (p < 0.001 for all tests; Table 5).
DISCUSSION
The results showed that 4 out of the 5 measures, the WPS-RA, WAI, QQ, and WPAI (considered acceptable as only marginally short at 0.74), met or exceeded the 0.75 threshold of acceptability for agreement between a baseline and 2-week interval, an interval we felt would reflect a stable situation with little or no change in work productivity. The correlations between the global measures suggested mostly moderate construct validity. Low to moderate convergent validity between the measures and general well-being was evident. Although the ICC of the 5 global measures have not previously been compared, individually our results are congruent to the limited reliability evidence used within different chronic conditions such as irritable bowel syndrome29 and Crohn’s disease30, while the data from the other global measures serve to build upon the sparse evidence base.
The discrepancy between ICC in the manual and nonmanual participant groups was an interesting finding. Consistently higher agreement for the nonmanual subgroup was evident, with the most extreme difference lying within the QQ quality question (nonmanual 0.83; manual 0.59). This may be partly determined by a variance in physical exertion in manual work over time compared to sedentary work, as reflected in inconsistent scoring over the 2 timepoints. Manual work is known to be a risk factor for presenteeism in musculoskeletal disorders31,32,33, highlighting a contextualized relationship between rheumatic diseases and work productivity. Although this would normally account for differences in the magnitude of work ability, in this study it also seemed to suggest some instability in scores (lower reliability) in jobs with more fluctuating demands. As such, it may be worthwhile to account for intensity of job demands in future analysis.
Correlations between the 5 global measures in our study were moderate. This is similar to the literature in which previous findings demonstrate moderate correlations between at-work productivity outcome measures [e.g., WALS and the Endicott Work Productivity Scale (r = 0.55) and WALS and the WLQ-25 index (r = 0.61)]9. Although each measure at its core measures presenteeism, different concepts are included such as “productivity” (WPAI), “performance” (HPQ), and “ability” (WAI), as well as different timeframes, for example, “today” (QQ), “7 days” (WPAI), and “one month” (WPS-RA). However, the strong association between the WPAI and the WPS-RA (r = 0.88) suggests congruence between the 2 measures, potentially with regard to their conceptual foci, as both measures utilize the term “productivity.”
The low to moderate results of the VAS general well-being and global measure correlations imply that productivity loss may be only partly captured by general health issues, suggesting that other factors, including contextual factors, may contribute to productivity loss. Recent research into contextual factors of work disability and absenteeism supports this notion, showing that factors such as family support towards work, work modifications, and physical job demands influence both work disability and absenteeism34,35,36; the same factors may also apply to presenteeism.
There were a few limitations to our study. The reversal of the VAS for the WPAI and WPS-RA accounted for the loss of 5 patients in the analysis, as a misinterpretation of the direction of the anchors on the scales was apparent. The opposing scores on the remaining measures indicated a clear intention to score similar scores on the measures with reversed VAS. The collection of general well-being VAS only at the 2-week timepoint is a limitation. However, an interval of 2 weeks for test-retest investigations is often thought to be appropriate as there is little risk for recall bias and clinical change25,37. Further, it is possible that an element of social desirability bias may have influenced the results; however, participants were reassured of data confidentiality to eliminate any uncertainty of employers seeing participants’ responses, which we hope minimizes the possibility of social desirability bias. As well, although the questionnaires were rigorously translated using an approved translation company, some differences in translation may contribute to the moderate results; we expect the effect of this to be minimal due to procedures used during translation.
Test-retest reliability of the 5 global measures of presenteeism was mostly good, with 4 of the 5 measures meeting or exceeding the threshold of acceptable reliability. The WPS-RA performed best in the test-retest reliability analysis, and with the WPAI in the comparison between measures, suggesting interchangeability between available data using the 2 methods. Whether the remaining measures can be considered interchangeable is tentative. The findings suggest that different instruments in different diseases and discourses should perhaps be used. Based on our findings, the HPQ would not be recommended for use as a global measure of presenteeism in rheumatic working populations. The extent to which other measures provide different information likely depends on predictive ability with regard to absence and work disability. Additional work is warranted to support psychometric evidence of the 5 global measures, paying particular attention to potential contextual factors, such as manual and nonmanual occupations, that may influence study results.
ONLINE SUPPLEMENT
Supplementary data for this article are available online at jrheum.org.
Acknowledgment
We thank all site coordinators for their assistance in running the study in each participating country.
Footnotes
Supported by research grants from EULAR (EULAR Project PRO Call) and an unrestricted grant from AbbVie. This report includes independent research supported by the National Institute for Health Research Biomedical Research Unit Funding Scheme.
- Accepted for publication September 1, 2015.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.