Abstract
Objective. To assess the feasibility, validity, and reliability of the Patient Reported Outcomes Measurement Information System Global Health Short Form (PROMIS10) in outpatients with systemic lupus erythematosus (SLE).
Methods. SLE outpatients completed PROMIS10, Medical Outcomes Study Short Form-36 (SF-36), LupusQoL-US, and selected PROMIS computerized adaptive tests (CAT) at routine visits at an SLE Center of Excellence. Construct validity was evaluated by correlating PROMIS10 physical and mental health scores with PROMIS CAT, legacy instruments, and physician-derived measures of disease activity and damage. Test-retest reliability was determined among subjects reporting stable SLE activity at 2 assessments 1 week apart using intraclass correlation coefficients (ICC).
Results. A diverse cohort of 204 out of 238 patients with SLE (86%) completed survey instruments. PROMIS10 physical health scores strongly correlated with physical function, pain, and social health domains in PROMIS CAT, SF-36, and LupusQoL, while mental health scores strongly correlated with PROMIS depression CAT, SF-36, and LupusQoL mental health domains (Spearman correlations ≥ 0.70). Active arthritis, comorbid fibromyalgia (FM), and anxiety were associated with worse PROMIS10 scores, but sociodemographic factors and physician-assessed flare status were not. Test-retest reliability for PROMIS10 physical and mental health scores was high (ICC ≥ 0.85). PROMIS10 required < 2 minutes to complete.
Conclusion. PROMIS10 is valid and reliable, and can efficiently screen for impaired physical function, pain, and emotional distress in outpatients with SLE. With strong correlations to LupusQoL and SF-36 but significantly reduced responder burden, PROMIS10 is a promising tool for measuring patient-reported outcomes in routine SLE clinical care and value-based healthcare initiatives.
The routine measurement of patient-reported outcomes (PRO) is increasingly important in improving and enhancing healthcare1. PRO, including health-related quality of life (HRQOL), are among the outcomes that patients with rheumatic disease care most about. In an era wherein providers will be increasingly asked to demonstrate the value (health outcomes per unit cost)2 of healthcare, regular collection of PRO at the point of care will enable measurement of what patients prioritize, driving improvement of both quality of care and outcomes. The development of validated PRO measures, including global outcome measures, is a priority of the US Centers for Medicare and Medicaid Services (CMS) in the transition to the Merit-based Incentive Payment System and Alterative Payment Models3. Established CMS innovation programs in orthopedics and oncology already encourage the voluntary collection of PRO in preparation for future requirements4,5. The American College of Rheumatology (ACR) has called for the development of optimal performance outcome measures including PRO, in anticipation of similar requirements being mandated as part of value-based payment initiatives in rheumatology6.
Defining appropriate PRO measures in rheumatology is challenging. PRO measures must be relevant to patients with rheumatic disease, psychometrically valid, and responsive to changes in health status, while also minimally burdensome at the point of care. PRO measures historically used by rheumatologists have notable limitations in the context of value-based healthcare. For example, the Health Assessment Questionnaire (HAQ) is widely used in clinical and research settings to evaluate functional status related to arthritis, but has limited precision because it was developed to identify impairment in patients with greater disability than commonly seen today7. Further, the HAQ does not evaluate emotional or social domains of health, which are essential to the patient experience of illness. The Medical Outcomes Study Short Form-36 (SF-36), a commonly used PRO measure in clinical research, contains physical, mental, and social health domains, but is long and has complex scoring algorithms that were not designed for use at the point of care8. Shorter metrics, such as the SF-20 or SF-12, are available and may be better suited for clinical use, but are associated with fees discouraging widespread use. Numerous disease-specific instruments are available, but may be difficult to use across specialties in patients with multisystem diseases and do not enable comparisons across conditions.
The Patient Reported Outcomes Measurement Information System Global Health Short Form (PROMIS10), a 10-item PRO instrument measuring physical and mental health domains, addresses many of these limitations. PROMIS10 was developed as part of the US National Institutes of Health’s PROMIS initiative using item response theory, and it was rigorously validated in a sample of > 20,000 people, primarily from the community9. It is freely available, and as a universal PRO measure, PROMIS10 can be used across diseases, with T scores normalized to the US general population10. While PROMIS10 has been implemented in several health systems for PRO measurement in orthopedic, cardiovascular, and primary care populations11, its feasibility, construct validity, and reliability have not been evaluated in patients with rheumatic diseases, including systemic lupus erythematosus (SLE).
SLE is a prototypical rheumatic disease in which PRO measurement is critical. It is a systemic and heterogeneous illness in which quality of life has become central as mortality has improved12. Poor quality of life in patients with SLE is driven not just by the clinical manifestations of SLE but also by adverse effects of therapy, as well as common comorbid conditions such as anxiety, depression, and fibromyalgia (FM)13,14,15. To assess the full effect of SLE from the patient’s perspective, a global PRO measure that is easily implemented at the point of care would be invaluable to initiatives emphasizing patient-centered outcome measurement in rheumatology. In our study, we aim to evaluate the feasibility, construct validity, and test-retest reliability of PROMIS10 in outpatients with SLE.
MATERIALS AND METHODS
Patient enrollment
English-speaking patients ≥ 18 years receiving care at the Hospital for Special Surgery (HSS) Lupus Center of Excellence (New York) and meeting 1997 ACR SLE Criteria were eligible to participate16. Patients on dialysis and those with active malignancy, other than nonmelanomatous skin cancer, were excluded.
Patients with SLE were identified by their treating rheumatologist, and medical records were reviewed to confirm eligibility. Patients then consented during a routine outpatient visit. Consenting subjects were registered in Assessment Center (www.assessmentcenter.net), a free secure online research management tool maintained at the Northwestern University Research Data Center. Patients completed the Web-based surveys onsite during the visit by computer or iPad. Alternatively, if they chose, patients could complete the surveys remotely on a computer, tablet, or smartphone by an e-mailed study-specific URL.
PRO measures
Patients completed PROMIS Global Health Short Form version 1.1, consisting of 7 questions asking subjects to rate “in general” their physical, emotional, and social health, and 3 questions related specifically to emotional health, fatigue, and pain in the past 7 days. Subjects also completed 2 legacy PRO measures and several PROMIS computerized adaptive tests (CAT) to establish the construct validity of PROMIS10. Legacy instruments included the SF-36 Standard (US version 1.0), a frequently used generic PRO instrument validated for use in SLE clinical trials, and the LupusQoL-US, an extensively validated SLE-specific PRO questionnaire adapted for use in the United States8,17. Both legacy instruments have a 4-week recall period.
PROMIS CAT, which have been validated in SLE18, were selected based on prior focus group studies in which SLE patients identified domains of importance to them19,20. PROMIS CAT leverage item response theory to select the most informative questions from a domain item bank based on subjects’ responses.21 This permits the use of fewer questions per domain, with greater precision. Selected CAT included physical function (version 1.2), mobility (version 1.2), pain behavior (version 1.0), pain interference (version 1.1), ability to participate in social roles (version 2.0), satisfaction with social roles and activities (version 2.0), fatigue (version 1.0), anger (version 1.1), anxiety (version 1.0), and depression (version 1.0). PROMIS CAT ask about the 7 preceding days, with the exception of CAT in the physical and social health domains, which do not specify a recall timeframe. CAT were programmed to administer enough items to achieve a standard error (precision estimate) of ≤ 0.3, with 4 to 12 items per CAT.
All self-report questionnaires were administered through Assessment Center and all participants completed PROMIS10, PROMIS CAT, and legacy instruments. Half the participants were randomly assigned to complete PROMIS instruments first, and the other half to complete legacy PRO instruments first.
To assess PROMIS10 test-retest reliability, all participants were contacted by telephone or e-mail within 1 week of their baseline assessment to complete PROMIS10 a second time. A 7-point Likert scale anchor question was used to identify any changes in patients’ disease activity. Only patients reporting that the effect of SLE on their general health was “about the same” were included in the test-retest analysis, because their PRO should not have changed.
PRO measure scoring
PROMIS10 was scored into global physical health and global mental health components using a T score metric, in which the mean T score in the US general population is 50 (SD = 10). Higher PROMIS T scores reflect more of the trait being measured; higher global physical and mental health component scores indicate better global physical and mental health. PROMIS CAT were scored through Assessment Center using the same T score metric. The SF-36 is divided into 8 scales, each with a score ranging from 0 to 100, with higher scores reflecting better HRQOL. Scores can also be reported as the physical component summary and mental component summary (MCS), in which related scales are grouped and reported as a single score, normalized to the general US population, with a score of 50 representing the population mean. The LupusQoL contains 34 questions in 8 domains, with scores ranging from 0 to 100, with higher scores indicating better HRQOL.
Sociodemographic and clinical characteristics
Age, sex, race, ethnicity, insurance type, and physician diagnoses of anxiety, depression, and FM were obtained by patient self-report. Disease activity and damage at the time of the study visit were assessed by the subject’s treating rheumatologist using a physician’s global assessment (PGA), the Safety of Estrogens in Lupus Erythematosus National Assessment—Systemic Lupus Erythematosus Disease Activity Index (SELENA-SLEDAI), and the Systemic Lupus International Collaborating Clinics/ACR Damage Index (SDI)22,23. PGA range from 0 to 3, SELENA-SLEDAI scores range from 0 to 105, and SDI scores range from 0 to 46. Higher scores reflect greater disease activity and more end-organ damage.
Feasibility
The feasibility of administering PROMIS10 in routine practice was examined by comparing clinical and sociodemographic characteristics of patients who did and did not complete surveys, using t tests and chi-square or Fisher’s exact tests as appropriate. Time to complete the instrument was also measured.
Construct validity
Because the internal consistency and structural validity of PROMIS10 were previously established9, our study evaluated construct validity, specifically external convergent and discriminant validity of PROMIS10, using Spearman correlations comparing PROMIS global physical and global mental component scores with legacy PRO instruments and PROMIS CAT. Spearman correlation coefficients (r) of ≥ 0.70 indicate good convergent validity, while consistently lower r values suggest discriminant validity24. Correlations between PROMIS10 scores and disease activity and damage measures were similarly evaluated. We hypothesized that correlations between PROMIS global physical component scores and physical health domains in PROMIS CAT, SF-36, and LupusQoL would be ≥ 0.70, while PROMIS global mental component scores would correlate highly with emotional health domains in the reference instruments. We expected all other correlation coefficients to be < 0.70.
The association of clinical and sociodemographic factors with global physical and mental health component scores was evaluated with univariate and multivariable generalized linear models in which PROMIS scores were the dependent variables. Forward selection methodology was used to evaluate whether the additional inclusion of active arthritis, active hematuria, avascular necrosis, and history of psychosis/cognitive impairment were significant at the p < 0.05 level. We hypothesized that active arthritis and self-reported FM would be associated with worse global physical component scores, while self-reported anxiety, depression, and FM would be associated with worse global mental component scores.
Test-retest reliability
Test-retest reliability was evaluated in eligible participants completing the questionnaires twice, 7 days apart. Agreement between scores for each questionnaire was assessed with an interclass correlation coefficient (ICC) and standard error of measurement (SEM)25. ICC ≥ 0.7 indicated acceptable test-retest reliability26. All statistical analyses were performed with SAS version 9.3 (Cary).
Our study was reviewed and approved by the HSS Institutional Review Board (IRB# 14125).
RESULTS
Feasibility
There were 238 patients with SLE who were approached over 13 months of study recruitment, and 204 subjects were enrolled (86%, Figure 1). Participating subjects were predominantly female (93%), with mean (SD) age of 40.0 (13.2) years (Table 1). They were racially diverse, with 38% identifying as white, 30% black, and 13% Asian. Regarding ethnicity, 28% identified as Hispanic or Latino. There were 46% of participants who were publicly insured and one-third reported receiving disability benefits. There were no statistically significant differences in sociodemographic characteristics between participants and nonparticipants.
Clinical characteristics of participants are described in Table 1. The average (SD) SELENA-SLEDAI score of participants was 4.2 (3.5), indicating mild disease activity, though 20% were flaring per SELENA-SLEDAI Flare Index at the time of assessment. The mean (SD) SDI was 1.2 (1.7), consistent with minimal end-organ damage.
Of the 204 participants, PROMIS10 global physical and mental health component scores could be calculated in 199 and 187 subjects, respectively, as a result of skipped questions by a few of the respondents (Figure 1).
The number of questions and time per instrument are shown in Table 2. PROMIS10 took subjects a median (interquartile range) of 1.8 min (1.3–2.9 min), compared to median times of about 5 minutes each for the SF-36 and LupusQoL.
Construct validity
Global physical and mental health component score distributions showed known groups validity, as both were worse than the general population, which is expected in patients with SLE13. Mean T scores in both domains were more than one-half SD below the general population mean of 50 (Figure 2).
Construct validity of PROMIS10 is shown in Table 3. Convergent validity for the global physical health component score was strong (r = 0.71–0.80); the largest correlations were with physical function and pain domains of the legacy instruments, and physical function and pain interference PROMIS CAT. The global mental health component score also showed strong convergent validity, correlating highly with the depression PROMIS CAT (r = −0.73), the SF-36 mental health and MCS scales (r = 0.72), and the LupusQoL emotional health domain (r = 0.70). Discriminant validity was demonstrated with weaker correlations (r < 0.60) between global physical and mental health component scores and divergent legacy instrument domains. Correlations of the global physical and mental health component scores with physician-derived measures of SLE disease activity and damage were particularly weak (r = −0.12 to −0.31).
The association of clinical and sociodemographic factors with global physical and mental health component scores was evaluated. All characteristics evaluated except age, race, and disease duration were associated with statistically significantly worse scores in univariate analyses. In multivariable models, these characteristics remained statistically significantly associated with worse global physical health scores: Hispanic ethnicity, being on disability, active arthritis, anxiety, and FM. Anxiety and FM were also statistically significantly associated with worse global mental health scores (Table 4). However, the only associations that were clinically meaningful (i.e., a T score difference of one-half SD or more) were active arthritis and FM for global physical health scores, and anxiety for global mental health scores.
Test-retest reliability
Ninety participants who reported no change in the effect of SLE on their health completed PROMIS10 a second time 7 days later (average 6.9 days). Of these 90 subjects, global physical and mental health could be scored in 88 and 80 participants, respectively. ICC (SEM) were very strong for both global physical health and global mental health component scores, with correlations of 0.89 (3.24) and 0.85 (3.50), respectively.
DISCUSSION
In our study, we demonstrated that PROMIS10 is feasible, valid, and reliable in a sociodemographically and clinically diverse SLE cohort receiving routine outpatient care. The vast majority of patients approached for our study opted to participate, and there were no statistically significant differences in age, race, ethnicity, insurance, type, disability status, or disease duration among those who did and did not complete surveys. PROMIS10 showed strong convergent and discriminant validity with domains of disease-specific and universal legacy instruments and domain-specific PROMIS CAT. Importantly, PROMIS10 correlated poorly with the SELENA-SLEDAI and SDI, demonstrating that PROMIS10 measures fundamentally distinct patient-centered outcomes that are not collected by physician-derived measures of SLE disease activity and damage. These low correlations with physician assessments underscore the need to deploy PRO measures such as PROMIS10 to record the complete patient experience of SLE.
In addition to establishing construct validity compared to legacy instruments, we demonstrate that PROMIS10 scores have no clinically meaningful association with sociodemographic characteristics such as race, ethnicity, insurance type, and disability status. Instead, PROMIS10 scores showed strong significant independent associations only with active arthritis, FM, and anxiety, which are health conditions that clearly drive the patient experience. Because it is the only clinical feature of SLE significantly associated with PRO in our study, active arthritis adversely affected global physical health even in the absence of physician-diagnosed flare. Notably, patient self-reported FM and anxiety had a nearly equivalent effect on PROMIS10 scores as had active arthritis; it is a novel finding that underscores the importance of asking all patients with SLE about these common comorbid conditions.
PROMIS10 is well suited to fulfill requirements for the collection of patient-reported global health outcomes in value-based payment programs. PROMIS10 can be easily integrated in clinical settings, a necessity for the routine collection of PRO. It is freely available and versatile, with paper and electronic versions that can be integrated and scored in an electronic medical record. PROMIS10 is available in numerous languages in addition to English, including Dutch, Danish, German, French, Italian, Portuguese, Spanish, and Chinese, with other translations in progress27. It is efficient, evaluating global physical and mental health in 10 questions that require under 2 min to complete, which is less than half the time required for the LupusQoL or SF-36, the current gold standard disease-specific and universal PRO measures in SLE. The Lupus Impact Tracker, a validated PRO designed for clinical use28, is similar in length to PROMIS10, but has limitations as an SLE-specific instrument, including potential challenges in implementation across specialties in a health system.
PROMIS10 offers several advantages as a universal PRO instrument. As a measure of global physical and mental health with a standardized scoring system, PROMIS10 is easily interpretable across medical specialties. PROMIS10 can be used to track patient outcomes across a health system, which is particularly important in rheumatology patients who may see numerous providers for management of their multisystem conditions. Patients with SLE, for example, often see nephrologists, cardiologists, neurologists, ophthalmologists, orthopedists, and internists in addition to their rheumatologists, with certain populations more likely to have their SLE managed by nonrheumatologists29,30,31,32. The use of a universal quality-of-life measure, rather than multiple condition-specific instruments, may be more convenient and meaningful to providers, who lack familiarity with disease measures outside their specialty, and to patients who have multiple comorbid conditions, the effects of which may be difficult to tease apart.
PROMIS10, with its standardized T score system, provides a common language for patients and providers to understand patient-centered outcomes. Scoring generates normalized global physical and mental health scores that enable comparisons between an individual patient and the general population, as well as across diseases. An additional benefit is that as a universal global measure, PROMIS10 can be used to derive EQ-5D health preference scores, which can be leveraged for valid comparative cost-effectiveness studies33. In addition to global physical and mental health T scores, PROMIS10 provides specific assessments of fatigue, social participation, pain, mood, and physical function through the reporting of each question. Because of its limited number of questions (average 1–2 per domain), PROMIS10 may lack precision in specific domains, precluding its use as a standalone PRO measure to guide individual therapy. However, PROMIS10 could be used as a powerful screening tool to prompt further evaluation of areas of concern.
While our study demonstrates the feasibility and validity of PROMIS10 in outpatients with SLE, further studies are needed to evaluate how it should best be used in clinical care and value-based healthcare initiatives. It is critical to identify score thresholds, which would trigger more focused evaluation and intervention, and thresholds for acceptable HRQOL. Eliciting patient and provider perspectives on the relevance and usability of PROMIS10, and educating providers on score interpretation and action plans, will also be essential for effectively implementing routine administration of PROMIS10 in clinical settings.
The strengths of our study include the large, clinically and sociodemographically diverse group of subjects, all of whom met ACR SLE classification criteria. Participation rates were high, perhaps reflecting the significant interest of patients in sharing their experience of illness. We collected several patient-reported and clinical outcome measures and examined the contribution of comorbid FM, anxiety, and depression to PROMIS10 scores. Importantly, we evaluated PROMIS10 in patients presenting for routine outpatient visits. Validation in this real-world setting, rather than in an existing research cohort, provides insight into the feasibility of PROMIS10 at the point of care.
Our study has several important limitations. It was limited to English-speaking patients seen at a single tertiary care academic center and as a result, findings may not be generalizable. Further studies are essential to evaluate PROMIS10 in community settings and non–English-speaking populations. Though participation rates in our study were high, a small percentage of PROMIS10 surveys were not possible to score as a result of skipped questions. It is unclear whether subjects intended to leave questions blank or if they encountered technical difficulties with the Assessment Center during survey completion, but future studies may be helpful to explore barriers to implementation.
Because this is the first study to evaluate PROMIS10 at the point of care in SLE, to our knowledge, our study provides a foundation for future investigations in other rheumatic disease populations, as well as longitudinal studies of the responsiveness and clinical utility of PROMIS10 in SLE. Routine collection of PRO using instruments including PROMIS10 will be a major step forward in engaging patients in their care and in beginning to evaluate the quality of the care we provide.
Acknowledgment
The authors acknowledge Catherine MacLean, MD, PhD, for critical review of the manuscript, and Rima Abhanykar and Kelly McHugh for assistance with data collection.
Footnotes
Financial support from the Rheumatology Research Foundation Scientist Development Award, and the Hospital for Special Surgery, Lupus and APS Center of Excellence Research Award.
- Accepted for publication October 20, 2017.