Abstract
Objective. To assess the reliability and validity in ankylosing spondylitis (AS) of selected Patient Reported Outcomes Measurement Information System (PROMIS) Short Forms (SF) developed by the US National Institutes of Health. The analysis was done across core sets and patient-identified domains of the Assessment of Spondyloarthritis international Society.
Methods. Participants in the Prospective Study of Outcomes in Ankylosing Spondylitis (PSOAS), an ongoing, prospective longitudinal observational study, completed 6 PROMIS SF assessing global health, depression, fatigue, physical function, pain intensity, and pain interference during their PSOAS visits from September 2017 to January 2019. Test-retest reliability and internal consistency were assessed using intraclass correlation coefficients and Cronbach’s alpha coefficient, respectively. PROMIS SF were compared to legacy measures collected. Construct validity was evaluated through examination of score distributions and floor effects, and through examination of the Spearman correlation coefficients between PROMIS measures and existing legacy AS measures. Discriminant validity was tested across Ankylosing Spondylitis Disease Activity Score (ASDAS) groups.
Results. Participants (n = 119) were mostly male (69%), white (81%), and with a mean (SD) age of 51 (± 15) years. Legacy measures demonstrated floor effects that were not present in PROMIS SF. Good test-retest reliability (r > 0.8) and excellent internal consistency (α > 0.9) was noted in the PROMIS SF. The 6 PROMIS SF correlated moderately to strongly [ρ 0.68 (Depression) to −0.87 (Physical Function)] with appropriate legacy measures. PROMIS scores measures worsened significantly (p < 0.05) with higher ASDAS groups.
Conclusion. This study supports the reliability and construct validity of PROMIS SF to assess AS symptoms from a single-center sample of patients with AS. Further research is needed to test responsiveness, feasibility/resource burden, and different cultural/societal contexts for patients with AS.
- ANKYLOSING SPONDYLITIS
- ARTHRITIS IMPACT PATIENT REPORTED OUTCOMES
- MEASUREMENT INSTRUMENT
- PSYCHOMETRICS
- SURVEYS
Patient-reported outcomes (PRO) are an important component of rheumatologic care and research. They have increased patient participation and yielded valuable information on treatment efficacy and quality of life that is pertinent to the management of patients with these complex, chronic diseases1,2. Subsets of PRO have also been established as core outcome domains for many rheumatic diseases to evaluate therapeutic efficacy3.
Ankylosing spondylitis (AS) is a disease characterized by inflammatory back pain and radiographic disease of the axial spine, with an estimated prevalence of 0.2% to 0.5% in the US population4. Clinicians have widely adopted the use of PRO as important tools in AS management. In fact, PRO comprise the largest share of the primary outcomes in randomized controlled trials in AS5. The Assessment of Spondyloarthritis international Society/Outcome Measures in Rheumatology (ASAS/OMERACT) international groups have established 3 independent core sets of domains used to measure outcomes: (1) disease-controlling antirheumatic treatment, (2) symptom-modifying antirheumatic drugs and physical therapy, and (3) clinical record-keeping6. All 3 core sets include the domains of fatigue, function, pain, patient global assessment, and stiffness. These domains are important both from a research and clinical care level in AS.
Universal (or “generic”) PRO measures represent an opportunity to compare disease burden and treatment effect across different chronic conditions using a common metric7. The US National Institutes of Health (NIH)–funded Patient Reported Outcomes Measurement Information System (PROMIS) incorporates both adult and pediatric PRO in physical, mental, and social health domains across a wide variety of chronic diseases and conditions as well as general population controls, potentially allowing for this type of comparison8. The physical health domains captured by PROMIS are particularly relevant in rheumatology9. The use of item response testing (IRT) methodology yielded computer adaptive tests (CAT), static profiles, and short forms (SF) PROMIS instruments that are publicly available for use. Investigators continue to explore how PROMIS measures can be incorporated into different aspects of medical research and care10.
While the ASAS/OMERACT PRO measures are vital in the assessment of AS, the IRT methodology used in PROMIS potentially reduces redundancy, increases sensitivity by avoiding floor/ceiling effects, and decreases survey burden with its adaptive design. Additionally, the publicly available online data collection system (Assessment Center, www.assessmentcenter.net) may lower barriers to clinical research in AS through its accessibility and ease of use. The purpose of our study is to examine the reliability and validity of PROMIS SF in patients with AS.
MATERIALS AND METHODS
Patients.
Subjects were recruited from a single center [University of Texas Health Science Center at Houston (UTHealth)] among patients currently enrolled in the Prospective Study of Ankylosing Spondylitis (PSOAS) observational cohort. All clinic patients at the UTHealth study site who met modified New York Classification Criteria for AS, were at least 18 years of age and fluent in English were eligible for participation. PSOAS is a multicenter observational study initiated in 2003 with continued enrollment encompassing 5 sites: Cedars-Sinai Medical Center, Los Angeles, California; the NIH Clinical Center, Bethesda, Maryland; the McGovern Medical School at UTHealth; the University of California, San Francisco; and the Princess Alexandra Hospital in Brisbane, Australia. The research followed the Declaration of Helsinki and was approved by the University of Texas Institutional Review Board (HSC-MS-07-0022). Each participating subject reviewed and signed an informed consent form.
Procedures.
Patients were contacted by telephone, at their clinic visit, or at their study visit about participation and details regarding this ancillary study. After patients provided written informed consent, coordinators provided paper questionnaire packets in person and/or by e-mail for printing. A subset of patients was consecutively approached and asked to complete the same PROMIS SF > 48 h later to assess test-retest reliability from May to November 2018. We used data from a single patient visit per patient.
PRO assessments.
Focus groups of patients with AS (3 groups of 5 patients) were asked whether the domains listed in the ASAS/OMERACT were important for their disease and if there were any additional domains they felt needed to be measured. Five academic rheumatologists at UTHealth involved in AS research and patient care were individually asked the same questions. After soliciting these opinions, we found that the core domains were well accepted among patients and rheumatologists. Mental health, specifically depression, was an important domain that was the most noted domain not covered in the ASAS/OMERACT core set.
PRO that are routinely collected in the PSOAS cohort include Patient Global assessment [0–100 numerical rating scale (NRS)], Patient Pain assessment (0–100 NRS), Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), Bath Ankylosing Spondylitis Functional Index (BASFI), Ankylosing Spondylitis Disease Activity Score–C-reactive protein (ASDAS–CRP), and the Center for Epidemiologic Studies Depression Scale (CES-D)11,12,13,14. The BASDAI consists of 6 questions measured on a 0–10 scale covering 5 major symptoms of AS: fatigue, spinal pain, arthralgias/arthritis, enthesitis, and morning stiffness11. Additionally, Calin, et al developed the BASFI, a 10-question index measured on the mean of 0–10 scales focused on functional AS anatomical limitations12. The ASDAS is a newer disease activity measure designed specifically for AS and demonstrating high discriminant capacity; it incorporates acute-phase reactants [e.g., erythrocyte sedimentation rate (ESR) or CRP]14. The CES-D is a 20-item questionnaire measuring depressive symptom severity13. These measures were termed “legacy” measures and served as comparators for the PROMIS measures addressing similar constructs.
As part of the NIH Roadmap initiative for the 21st century for medical research, the multicenter cooperative group referred to as PROMIS was formed. This group used modern advances in computer technology and item-response theory to create free-to-use measures for physical, mental, and social health domains15. Among the ways to administer PROMIS measures (on paper, by computer, or mobile application), we chose SF distributed in paper packets for ease of use in a clinical setting. Scoring manuals for PROMIS measures (www.assessmentcenter.net/Manuals.aspx) outline the SF development, report psychometric properties for each instrument in their study population, and describe how to identify PROMIS T scores based on short form raw summed item scores. We reported PROMIS T scores for all SF. PROMIS SF Versions 1.0/1.1 (assessmentcenter.net) were administered for Emotional Distress (ED)–Depression, Fatigue, Global Health, Pain Intensity, Pain Interference, and Physical Function and ranged from 3 to 12 questions per form (Supplementary Figure 1, available with the online version of this article). For the PROMIS Global v1.1 SF we reported the physical summary score. These domains were selected based on patient input, expert rheumatologist input, and the published ASAS/OMERACT core set for clinical record keeping6. Higher PROMIS scores represent more of the measured trait, so interpretation of directionality varied if the domain was a positive trait (higher scores better) versus symptom (higher scores indicate more severe symptoms). Time to complete was self-reported by patients upon completion of the PRO packet. Through open-ended critiques, we additionally solicited patient feedback on the PROMIS questionnaires regarding how well they addressed the measured domains and whether any important aspects of their disease were not being addressed.
Covariates.
Sociodemographic information was drawn from the patients’ data extracted from the PSOAS cohort and included age, sex, race/ethnicity, education, smoking habits, comorbidities, work status, and AS duration. Medication use, comorbidities, and serum inflammatory markers (e.g., CRP or ESR) were also recorded at each visit in addition to radiographs of the hips and the cervical and lumbar spine (the latter measured by the modified Stoke Ankylosing Spondylitis Scoring Scale), obtained at 2-year intervals over the course of followup.
Statistical analysis.
Central tendency and distribution were calculated by mean (SD) or median [interquartile range (IQR)] for continuous normal versus non-normal data, respectively. Frequencies and percentages were descriptively reported for categorical variables. We examined histograms, and skewness and kurtosis statistics to assess normality16. For skewness and kurtosis, we looked at their Z scores by dividing their values over their standard error, with values > |1.96| considered significant. Spearman’s correlation was used to examine PROMIS scores against legacy PRO for similar domains. The Kruskal-Wallis H test with Bonferroni correction for pairwise comparisons was used to compare PROMIS and legacy PRO domains stratified by ASDAS group levels. Intraclass correlation coefficient and Cronbach’s alpha coefficient were used to assess test-retest reliability and internal consistency, addressing reliability; a correlation coefficient or alpha coefficient > |0.7| was considered acceptable. We hypothesized a priori that there would be moderate to strong correlation (ρ >|0.6|) between the PROMIS measures and the target legacy measures. All analyses were done with IBM SPSS version 24.
RESULTS
Patient characteristics.
A total of 119 patients were enrolled and completed the surveys between September 2017 and January 2019. Twenty-four of the 88 patients (27.3%) from May 2018 through November 2018 completed the retest packet in addition. This sample included a diverse spectrum of AS characteristics (Table 1). Patients were mostly male (69%) and white (81%), with a mean (SD) age of 51 years (15). All patients met the modified New York Criteria for Ankylosing Spondylitis with a mean symptom duration of 25 years (± 13). In those who had available CRP laboratory values (90/119, 76%), over half (64%) had inactive or moderate disease by ASDAS.
Distributions of PRO.
PROMIS and legacy scores are shown in Table 2. No significant kurtosis was noted in the legacy or PROMIS questionnaires. All legacy PRO were moderately to highly positively skewed (0.54 to 1.00), with the Patient Global, Patient Pain, BASFI, and CES-D significantly skewed (p < 0.05). Many of the instruments demonstrated floor effects with the proportion of patients at the lowest potential score in each of the legacy measures ranging from 5% to 17%: Global Health NRS (17%), CES-D (9%), BASDAI-Fatigue (5%), Pain (14%), and BASFI (11%). The PROMIS instruments showed a more normal distribution compared to legacy measures in physical domains (Figure 1, and Supplementary Figure 2, available with the online version of this article) with PROMIS Global, Fatigue, Pain Interference, Pain Intensity, Physical Function about symmetrical (−0.1 to 0.39 skew, p > 0.05). However, PROMIS ED-Depression had positive skew (1.01, p < 0.05) with significant floor effect (e.g., 54% with lowest possible value PROMIS-ED; Supplementary Figure 2). Floor effects for the rest of the PROMIS measures otherwise ranged from 1% to 31%: Global (1%), Fatigue (9%), Pain Interference (31%), Pain Intensity (11%), and Physical Function (1%). Among a subset that were sampled for time of completion, 35/41 (85%) stated that overall it was < 15 min to complete their PROMIS SF packets. Fourteen of the 119 patients (12%) raised concerns regarding the PROMIS SF addressing their disease.
Reliability.
Test-retest reliability was tested among the 24 participants. The median (IQR) time between the 2 measures was 1 day (IQR 1–2). Correlations between the individual tests’ 2 scores ranged from 0.80 (Pain Interference) to 0.98 (Physical Function). We also examined internal consistency using Cronbach’s alpha. We found excellent consistency within the individual scales ranging from 0.91 (Global) to 0.98 (Pain Interference).
Construct validity: convergent validity and known groups validity with legacy measures.
PROMIS Global Health, Physical Function, and Pain Intensity had very strong correlation (ρ value > 0.84, p < 0.01) with corresponding legacy measures (Global NRS, BASFI, and Pain NRS, respectively; Table 3). PROMIS Pain Interference and Fatigue showed strong correlation (ρ > 0.7, p < 0.01) with corresponding legacy measures (Pain NRS and BASDAI-Fatigue, respectively). The weakest correlation was seen with PROMIS Emotional Distress-Depression, which had moderate correlation (ρ = 0.68, p < 0.01) with CES-D.
In general, PROMIS scores measures worsened significantly (p < 0.05) with increased disease activity as defined by ASDAS categories ranging from inactive disease (ASDAS < 1.3) to high–very high disease activity (ASDAS ≥ 2.1) in the domains of Global, Fatigue, Pain Intensity, Pain Interference, Depression, and Physical Function (Table 4). In pairwise comparisons, PROMIS Global and Physical Function distinguished inactive, moderate, and high–very high ASDAS-defined disease activity. PROMIS Fatigue, Pain Intensity, and Pain Interference were able to distinguish ASDAS inactive and moderate from high–very high disease activity. We observed these same patterns among the legacy measures (Patient Global, Pain, BASFI, BASDAI-Fatigue) that addressed physical domains. PROMIS ED-Depression measure was unable to distinguish across ASDAS-defined disease activity, unlike CES-D, which could distinguish high–very high disease compared to inactive disease.
DISCUSSION
To the best of our knowledge, this study is the first to examine the reliability and construct validity of PROMIS instruments in patients with AS in the context of ongoing clinical care. We selected PROMIS SF from a patient and clinician perspective based on patient and physician input as well as review of the ASAS/OMERACT clinical record-keeping domains. Among the 6 domains we studied (Depression, Fatigue, Global Health, Physical Function, Pain Intensity, and Pain Interference), 5 showed at least strong correlation (ρ > |0.7|) with the appropriate legacy AS measure. Additionally, in the physical domains of Global Health and Physical Function, PROMIS measures were able to discriminate inactive, moderate, and high–very high ASDAS activity groups. Similarly, in the other physical domains (i.e., Pain Intensity, Pain Interference, and Fatigue), the PROMIS measures could discriminate high–very high disease versus low-moderate disease activity groups. In depression, the only mental health domain, PROMIS Depression could not distinguish across disease activity levels, suggesting that depressive symptoms as defined in this SF may not be disease-related. A majority of patients also found these forms to take < 15 min to complete. These findings support the feasibility, reliability, and construct validity of PROMIS SF when assessing physical domains in AS outcomes.
PROMIS instruments have been evaluated across multiple physical, mental, and social health domains in other rheumatic diseases including juvenile idiopathic arthritis, osteoarthritis (OA), psoriatic arthritis, rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), systemic sclerosis, and vasculitis17–23. While a majority of these studies have focused on PROMIS CAT, to date PROMIS SF have been studied in OA, RA, and SLE in similar fashion24,25,26,27. PROMIS instruments have also been studied in treatment response28. In addition, PROMIS measures have also correlated with objective measures. For example, Mahieu, et al demonstrated reliability and construct validity in PROMIS Fatigue with accelerometer-based measures of physical activity in patients with SLE29. This may suggest the potential cross-disease nature of these universal PRO.
Strengths of our study included use of a well-characterized cohort reflective of patients with AS in the United States, with AS legacy measures collected routinely at each study visit. All patients met modified New York criteria for ankylosing spondylitis, creating a homogeneous patient sample from a radiographic perspective. We evaluated the performance of PROMIS measures within the context of usual care.
Our study had limitations. The highly educated, largely white demographics of our patient sample may affect generalizability. Floor effects noted for the PROMIS Depression SF and CES-D may have been due to the low depression rate in our sample. Further, by including only patients who met modified New York criteria for AS, we excluded patients on the disease spectrum with nonradiographic axial spondyloarthritis (nr-axSpA). Thus, our study may not be generalizable to nr-axSpA patient populations. Because of potential rapid changes in underlying disease activity with medications, the 48-h type window was chosen to assess test-retest, a practice that may also artificially elevate the correlation observed. For both test-retest and time of completion, we acknowledge potential participation bias of those who volunteered this information. We also did not examine responsiveness of the PROMIS instruments in this study, and limited our study to English-speaking patients. Further, while the use of SF is feasible in all potential settings given its paper format, we did not study the CAT or profile PROMIS instruments.
Our study offers preliminary data in the study of PROMIS instruments in AS. Further study is required to see, separately, whether the PROMIS CAT can reduce redundancy, increase sensitivity by avoiding floor/ceiling effects, or decrease survey burden with its adaptive design in patients with AS. Future PROMIS SF studies in AS could include translations of PROMIS instruments, given the dynamic demographics seen in the US population. Further, longitudinal studies are required to study the responsiveness of PROMIS SF in patients with AS.
Our study offers evidence supporting the feasibility as well as the reliability and construct validity of 6 PROMIS instruments in AS clinical care. Additionally, our study demonstrates the effect and disease burden of AS across the domains studied relative to the general population, by comparing AS scores to population normative values. Because PROMIS measures are more widely used in clinical trials and US clinical care, the construct validity of these measures in axSpA will be increasingly relevant. Future work will examine the longitudinal construct validity and discrimination of these instruments in treatment initiation scenarios and continue to elucidate how PROMIS instruments can be used to understand the effect of AS in different cultural and societal contexts.
Acknowledgment
The authors are deeply grateful to the patients and their families for their participation and support in this study, and to the staff at the Division of Rheumatology and Clinical Immunogenetics, Center for Clinical Research and Evidence-based Medicine, and Center for Clinical and Translational Sciences at UTHealth. We also thank the PSOAS investigators for their continued collaboration.
Footnotes
This project was supported by the US National Institutes of Health (NIH) PO1-AR052915. Dr. Ogdie is supported by NIH K23 AR063764, R01 AR072363, and the Rheumatology Research Foundation. Dr. Reveille is supported by the Spondyloarthritis Association of America.
- Accepted for publication July 19, 2019.
REFERENCES
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.