Abstract
Objective. To assess the sensitivity of the Patient-Reported Outcomes Measurement Information System 29-item Health Profile (PROMIS-29) and the Functional Assessment of Chronic Illness Therapy-Dyspnea 10-item short form (FACIT-Dyspnea) for measuring change in health status and dyspnea in systemic sclerosis (SSc).
Methods. One hundred patients with SSc completed the PROMIS-29, FACIT-Dyspnea, and traditional instruments [Medical Research Council Dyspnea Score, St. George’s Respiratory Questionnaire (SGRQ), Health Assessment Questionnaire-Disability Index (HAQ-DI), and Medical Outcomes Study Short Form-36 (SF-36)] at baseline and 1-year visits. PROMIS-29, FACIT-Dyspnea, and traditional instrument change scores were compared across composite modified Medsger Disease Severity and modified Rodnan Skin score (mRSS) change groups.
Results. Moderately high Spearman correlation coefficients were observed between FACIT-Dyspnea and SGRQ (r = 0.57), FACIT-Dyspnea functional limitations and SF-36 physical component summary (PCS; r = 0.51), PROMIS-29 physical functioning and HAQ-DI (r = 0.50), and SF-36 PCS (r = 0.52) change scores. In most validity comparisons, PROMIS-29, FACIT-Dyspnea, HAQ-DI, and SF-36 scores performed similarly. While PROMIS-29 covers more content areas than SF-36 (e.g., sleep), it may do so at the expense of responsiveness of its 4-item physical function scale as compared to the multiitem-derived SF-36 PCS. Statistically significant increases in SF-36 role physical (p = 0.01) and physical component scale (p = 0.016), but not PROMIS-29, were observed in patients with mRSS improvement.
Conclusion. PROMIS-29 and FACIT-Dyspnea are valid instruments to measure health status and dyspnea in patients with SSc. In physical function assessment, longer PROMIS short forms or computer adaptive testing should be considered to improve responsiveness to the effect of skin disease changes on physical function in patients with SSc.
There are many impediments to conducting effective clinical trials and to making trial comparisons in scleroderma/systemic sclerosis (SSc). These include spontaneous improvement in some untreated patients with SSc, lack of rigorously validated indices of SSc disease severity and activity, and the lack of an accepted set of patient-reported outcome (PRO) instruments that are used in clinical trials, among others. Several traditional PRO instruments have been validated or are commonly used in SSc clinical studies, including the Scleroderma Health Assessment Questionnaire Disability Index (s-HAQ-DI), Medical Outcomes Study Short Form-36 (SF-36), St. George’s Respiratory Questionnaire (SGRQ), and the Medical Research Council Dyspnea Scale (MRC-DS)1.
The s-HAQ includes 2 or 3 items for 8 activity domains, including dressing, grooming, arising, eating, walking, hygiene, reach, and grip, as well as 6 visual analog scales (pain, intestinal and breathing problems, Raynaud and digital ulcer interference in daily activities, and overall disease severity scale). Mean domain scores and a composite score are calculated (low scores favorable), but there is no standardized method for incorporating visual analog scores into the total score. The SF-36 requires a licensing fee and consists of 36 items assessing physical functioning, bodily pain, mental health, role limitations attributable to emotional health, vitality, and general health perceptions. Composite physical and mental component scores are calculated (high scores favorable). The SGRQ requires special permission before use and consists of 16 differentially weighted items that assess dyspnea (low score favorable). The freely available MRC-DS is a single 5-point item assessing dyspnea (low score favorable). Different scoring systems and lack of free use for some instruments complicate routine clinical or research use.
Beginning in 2004, the National Institutes of Health sponsored the Patient-Reported Outcomes Measurement Information System (PROMIS) to develop and validate item-response theory-based item banks to quantify physical, mental, and social health across patient populations2. In addition to item banks, short forms measuring general health status [PROMIS 29-Item General Health Form (PROMIS-29)] and dyspnea [Functional Assessment of Chronic Illness Therapy-Dyspnea (FACIT-Dyspnea)] were developed and validated in the general population and in patients with self-reported chronic obstructive pulmonary disease, respectively1,3,4,5. PROMIS-sponsored instruments are available free of charge, validated in many diseases, created in multiple languages, are readily administered electronically, can be administered as part of computerized adaptive tests, and use a uniform standardized scoring system that is simple to interpret6. We have demonstrated the construct validity for discriminative purposes of PROMIS-29 and FACIT-Dyspnea 10-item short form in SSc1. The goal of this project was to determine whether PROMIS-29 and FACIT-Dyspnea are responsive to change in SSc disease severity over 1 year, and thus useful in clinical trials and other longitudinal studies.
MATERIALS AND METHODS
We studied the first 100 patients enrolled in the Northwestern Scleroderma Patient Registry who completed PRO instruments at baseline and at 1-year followup. Patients fulfilled the American College of Rheumatology (ACR) criteria for SSc or at least 3 out of 5 CREST criteria (calcinosis, Raynaud phenomenon, esophageal dysfunction, sclerodactyly, telangiectasia)7. Subsequent to the publication of the revised ACR/European League Against Rheumatism (EULAR) SSc criteria, retrospective chart review was conducted to determine the number of subjects that fulfilled the new 2013 criteria8. Informed consent for study participation was obtained in accordance with the Northwestern University Institutional Review Board guidelines. The battery of PRO instruments administered at baseline (pen and paper) or 1 year later (by iPads) included traditional instruments (SGRQ, MRC-DS, S-HAQ-DI, and SF-36) and novel PROMIS-sponsored instruments (PROMIS-29 and FACIT-Dyspnea 10-item short form). These modes of administration (pen and paper, electronic) have been shown to be equivalent for PROMIS instruments9.
To evaluate PRO instruments’ external validity for discriminative purposes, a composite modified 7-item Medsger SSc Disease Severity Score/severity score was calculated at baseline and 1 year as described1,10. Briefly, the composite severity score was calculated using data collected within 3 months of a clinic visit that included the modified Rodnan Skin score (mRSS); serum levels of brain natriuretic peptide (BNP), creatinine (crt), and hemoglobin (hgb); 2-dimensional echocardiography with tissue Doppler estimated right ventricular systolic pressure (est. RVSP); pulmonary function test (PFT); forced vital capacity in 1 s (FVC1); and DLCO percent predicted. Medsger Disease Severity Scale items that were not included in our modified score included hematocrit, change in weight, digital pitting scars/fingertip ulcerations/gangrene, finger to palm distance, proximal weakness, as well as esophageal and small bowel study information because those data were not included in our registry.
Demographic data, including self-reported race and smoking history, were collected on questionnaires administered to patients by paper or iPad. Anthropometric, laboratory (BNP, crt, hgb), mRSS, and PFT data were attained by querying the Northwestern Medicine Enterprise Data Warehouse (EDW) using an approach with proven validity11. The Northwestern Medicine EDW is an electronic clinical data repository of patient data stored in a variety of medical software systems at Northwestern University11. Any missing data were confirmed by manual chart review, and external results not available through the EDW were added. Echocardiographic data were obtained by manual chart review. Modified Medsger Disease Severity Scale variables were classified with scores of 0 (mild), 1 (moderate), or 2 (severe). Abnormal values were prospectively defined as previously described1. Individual item scores were summed to generate a composite severity score1.
To confirm the discriminative ability of PROMIS-29 and FACIT-Dyspnea for assessing differences in SSc disease severity, subjects were divided into categories based upon the severity score (0–1 = mild disease, 2–3 = moderate disease, 4 + = more severe disease), and descriptive statistics were generated for instrument scores at baseline and followup. PROMIS and FACIT instruments use a T score metric with a mean (SD) set to 5010. The PROMIS-29 reference population was the general population, while the FACIT-Dyspnea reference population consisted of patients with self-reported chronic obstructive pulmonary disease3,12. Instruments measure the amount of a domain present, thus high functional and low symptom domain scores are favorable.
To compare sensitivity to change over time between the new and the traditional instruments for the measurement of dyspnea and global health, Spearman correlation coefficients were calculated for change scores between baseline and followup. FACIT-Dyspnea was compared to the SGRQ and MRC-DS change scores, FACIT-Dyspnea functional limitation was compared to HAQ-DI change score, and PROMIS-29 physical functioning was compared to HAQ-DI and SF-36 physical component change scores. Definitions for correlation were established prospectively as follows: r < 0.3 = low correlation, 0.3 ≤ r ≤ 0.5 = moderate correlation, and r > 0.5 = high correlation1.
Three additional approaches were used to assess PROMIS-29 and FACIT-Dyspnea’s responsiveness to change. In the first approach, patients were divided into worsened, unchanged, and improved severity score categories. Based upon this, PROMIS-29 and FACIT-Dyspnea descriptive statistics, as well as effect sizes (ES), were determined. In the second approach, patients were divided into worsened, unchanged, and improved MRC-DS change groups, and FACIT-Dyspnea descriptive statistics as well as ES were determined. In the third approach, the sensitivity of PROMIS-29 to change in mRSS over 1 year was assessed. A ≥ 5-point decrease in mRSS was defined as improvement based upon evidence that this is an important difference13. Subjects were classified as improved (mRSS decrease ≥ 5) versus worsened (mRSS increase ≥ 5) or stable (mRSS change 0–4), and descriptive statistics, as well as ES, were determined. Effect sizes > 0.2 were deemed possibly clinically important14,15. ANOVA was used to determine the significance of between-group differences. McNemar’s test was used to compare pre- versus postclassification of abnormal versus normal values. All analyses were conducted using SAS 9.2 (SAS Institute Inc.).
RESULTS
One hundred patients with SSc who completed the traditional and new PRO instruments both at baseline and 1 year later were studied. Ninety-seven percent of the subjects fulfilled the revised ACR/EULAR SSc criteria. As shown in Table 1, 91% of the subjects were white women with longstanding SSc disease duration, median (range) of 4.5 (0–32) years, defined as the duration between onset of first non-Raynaud symptom and baseline PRO instrument administration. Just over half (53%) of patients met classification criteria for limited cutaneous SSc16. RNA polymerase III autoantibodies were the most prevalent SSc-specific autoantibody (21 out of 58; 36%) compared to anticentromere (16 out of 96; 17%) and antitopoisomerase I (24 out of 93; 26%). A speckled antinuclear antibody was the most prevalent immunofluorescence pattern (39 out of 95; 41%).
The mean composite modified Medsger Severity Scale score was 3.2 (SD 2.1) at baseline and remained unchanged at 1-year followup. With the exception of an increase in the proportion of patients with low hgb, there were no statistically significant changes in the objectively measured disease variables that comprise the modified Medsger Severity Score between baseline and followup (Table 2). Similar percentages of patients had an abnormal BNP, crt, FVC, and DLCO percent predicted at baseline and followup. Creatinine significantly increased in 1 patient diagnosed with incident scleroderma renal crisis17. These data suggest that the majority of patients had stable cardiac, pulmonary, and renal disease over 1 year. There were 21 prevalent cases of anemia at baseline, and 14 incident cases of anemia at followup. Manual electronic medical record review for these 14 subjects demonstrated 5 cases of anemia because of chronic disease and 6 patients with iron deficiency anemia. No identifying cause was found in 3 subjects.
Contrary to our expectations, the percentage of patients with an abnormal estimated RVSP on echocardiography declined from baseline (n = 29, 35%) to 1 year (n = 12, 22%). Of the 12 subjects who had serial studies, 5 had persistent and 7 had newly diagnosed RVSP elevation. An etiology to account for increased RVSP was found in 4 out of 7 patients. One patient had pulmonary hypertension secondary to progressive interstitial lung disease. Another was diagnosed with pulmonary artery hypertension (mean pulmonary artery pressure at rest ≥ 25 mmHg with pulmonary capillary wedge pressure ≤ 15 mmHg on right-heart catheterization)18. One patient had worsened pulmonary venous hypertension secondary to diastolic heart failure, and another had worsening left ventricular filling pressures on echo.
To evaluate the ability of PRO instruments to detect cross-sectional differences in SSc disease severity, descriptive statistics for new and traditional instrument scores at baseline and followup were compared between patients stratified by modified Medsger Disease Severity Scale category (Table 3). Mean baseline and followup general and physical health PRO scores were significantly worse in patients with higher modified Medsger Disease Severity Scale scores: FACIT-Dyspnea (p ≤ 0.001); FACIT-Functional limitation (p ≤ 0.001); PROMIS physical function (p ≤ 0.001); PROMIS pain interference and social role (both p < 0.001, p = 0.002); HAQ-DI (p < 0.001); SF-36 physical functioning, role physical, physical component summary, bodily pain, and general (p < 0.001). Mean PROMIS Fatigue scores between groups were significant at baseline only (p ≤ 0.001). Mean PROMIS depression, anxiety, and sleep disturbance, and SF-36 emotional role, mental health, and mental component summary scores did not differ among the groups. These data indicate that PROMIS-29 and FACIT-Dyspnea are comparable to traditional instruments for discriminating between patients with SSc with different levels of disease severity as defined by a composite modified Medsger Scale.
The correlation between the new and traditional dyspnea instrument change scores in patients with SSc was assessed (Table 4). There was a high correlation between FACIT-Dyspnea and SGRQ total score (r = 0.57, p < 0.001) and a moderate correlation between FACIT-Dyspnea and MRC-DS (r = 0.32, p < 0.05). A moderate correlation was observed between FACIT-Functional limitations and SGRQ and HAQ-DI (both r = 0.35, p < 0.001), and MRC-DS (r = 0.32, p < 0.05). There was a strong negative correlation between FACIT-Functional limitations and SF-36 physical component score (−0.51, p < 0.001), which is in the expected direction because low FACIT functional limitation scores are comparable to high SF-36 physical component scores. These data suggest that dyspnea contributes to functional limitations in SSc, and that FACIT-Dyspnea change scores correlate with HAQ-DI and SF-36 change scores.
The correlation between new and traditional general health instrument 1-year change scores was assessed (Table 4). Strong negative and positive correlations, respectively, were observed between changes in PROMIS-29 physical functioning subscale and HAQ-DI (r = −0.50) and SF-36 physical component scores (r = 0.52, both p < 0.001; Table 4). Moderate negative correlations were found between changes in PROMIS-29 anxiety (r = −0.26, p < 0.05), depression (r = −0.36, p < 0.001), and fatigue (r = −0.42, p < 0.001) subscales, and SF-36 mental component summary. Negative correlation coefficients occurred in cases where high scores have different meaning (good vs bad) for different instruments. These results indicate that PROMIS-29 subscale change scores correlate well with similar traditional instrument scores and that PROMIS-29 is valid for measuring changes in overall health in patients with SSc.
Descriptive statistics for FACIT-Dyspnea, PROMIS-29, SF-36, and HAQ-DI change scores were determined for modified Medsger Scale change groups (improved, stable, worsened; data not shown). There were no significant between-group differences in any of the PRO change scores.
Descriptive statistics for FACIT-Dyspnea and change scores were determined for MRC-DS change groups (Figure 1). Patients with worsened MRC-DS change scores had higher FACIT-Dyspnea change scores [mean (SD) = 5.1 (6.2)] compared to patients whose MRC score remained stable or improved [–1.6 (4.6), −3.0 (6.4), p < 0.01]. Effect sizes were greatest for the worsened, followed by improved, and then stable groups (0.83, −0.47, −0.36, p < 0.01). These results indicate that FACIT-Dyspnea is a sensitive measure of 1-year change in dyspnea in patients with SSc.
Descriptive statistics for FACIT-Dyspnea and PROMIS-29 change scores were determined for mRSS change groups (Table 5). Patients with mRSS improvement demonstrated greater reductions in FACIT-Dyspnea and FACIT-Dyspnea functional limitations, and PROMIS-29 anxiety, fatigue, and sleep disturbance subscales than patients with worsened or stable skin scores, although these differences were not statistically significant. PROMIS physical function, depression, pain interference, and social role subscale scores remained stable in patients regardless of changes in skin score. These results suggest that FACIT-Dyspnea and FACIT-Dyspnea functional limitations, and PROMIS-29 anxiety, depression, fatigue, and sleep disturbance subscales may be more sensitive to mRSS changes over 1 year than PROMIS-Physical function, pain interference, and social role subscale scores. Statistically significant increases in SF-36 role physical (p = 0.01) and physical component scale (p = 0.016) were observed in patients with mRSS improvement.
DISCUSSION
The paucity of validated outcomes that are responsive to change is an important barrier to therapeutic response assessment in SSc. To the best of our knowledge, ours is the first study to assess the responsiveness to change of PROMIS-29 and FACIT-Dyspnea compared to traditional instruments in SSc.
Our SSc population was slightly more affected by their disease than in our previous report, as evidenced by a higher composite modified Medsger Disease Severity Index. Consecutive patients were approached to complete the battery of PRO instruments regardless of severity of illness. Studies have shown that most patients are willing to complete PRO instruments, suggesting that “favoring” the very sick by excluding them from health status research is unwarranted19,20.
Our results demonstrate that increased composite modified Medsger Disease Severity Scale scores were associated with statistically significant changes in new and traditional PRO instrument scores in the appropriate direction. Higher FACIT-Dyspnea, FACIT-Functional limitations, HAQ-DI, and PROMIS anxiety, depression, fatigue, pain interference, and sleep disturbance scores were observed as SSc disease severity increased. Similarly, lower SF-36 physical functioning, social role, physical component summary, role emotional mental health, mental component summary, vitality, bodily pain, social functioning, and general scores were observed in patients with higher disease severity. This is an expected finding because lower scores on all SF-36 domains represent worse outcomes. One important caveat regarding the composite modified Medsger Severity Scale that we used is that it is not sensitive to large changes in a single variable (i.e., a patient with a large change in only 1 variable will be considered stable). Another interesting finding is that patients with SSc at both timepoints demonstrated FACIT-Dyspnea and FACIT-Functional limitations below 50. Thus, patients with SSc report less dyspnea than patients with chronic obstructive pulmonary disease (reference population).
There were statistically significant moderate and high correlations between FACIT-Dyspnea and FACIT-Dyspnea functional limitations, and SF-36 physical component change scores, respectively. Further, significant correlations between PROMIS-29 physical functioning and SF-36 physical component change scores, and PROMIS-29 anxiety, depression, and fatigue subscale and SF-36 mental component change scores were observed. These data suggest that FACIT-Dyspnea, FACIT-Dyspnea functional limitations, and PROMIS-29 are sensitive to 1-year changes in dyspnea, anxiety, depression, and fatigue in patients with SSc as assessed by the correlation between change scores.
The responsiveness of HAQ-DI and SF-36 to mRSS changes in SSc has been demonstrated previously21,22. Patients with SSc who had a > 15% worsening of their skin score demonstrated worsening HAQ-DI scores, while patients with a > 15% improvement in skin score demonstrated improvement in HAQ-DI scores22. Similarly, patients with SSc with a ≥ 30% increase in skin score at 24 weeks during a clinical trial of the hormone relaxin demonstrated higher HAQ-DI scores and higher SF-36 physical component summary scores. We observed statistically significant differences between mean SF-36 role physical and physical component summary change scores, but not between mean HAQ-DI change scores in patients with worsening/stable or improved mRSS over 1 year. This is contrary to what was observed in the relaxin study where the HAQ-DI was more responsive to mRSS changes compared to the SF-36. Longer followup (1 year in our study compared to 24 weeks in the relaxin clinical trial) may account for the difference. There were no statistically significant differences in mean PROMIS-29 change scores in patients with worsening/stable or improved mRSS over 1 year. This is likely attributable to the fact that the SF-36 includes more than 10 items to assess physical function and role performance, all contributing to the physical component summary (PCS) score, while PROMIS-29 only includes 4 physical function items. If responsiveness in the area of physical function is desired from a PROMIS assessment, then longer physical function short forms such as the 10-item and 20-item versions should be considered, and could be used to supplement PROMIS-29. Similarly, PROMIS-43 or PROMIS-57 profiles, with 6-item and 8-item physical function short forms, respectively, may be preferable when the primary goal is a responsive physical function assessment. Of course, computerized adaptive testing (CAT) would likely provide the best measurement precision across the physical function continuum.
Improvement in mRSS over 1 year was associated with improved FACIT-Dyspnea, FACIT-Dyspnea functional limitations, and all PROMIS-29 subscales except pain interference, depression, and social role (these remained stable), but the results were not statistically significant. Less chest wall restriction and less work for activities of daily living in patients with lower skin scores may explain the association between mRSS and dyspnea. The correlation between change in FACIT-Dyspnea and FACIT-Dyspnea functional limitations and change in forced vital capacity (−0.192 and −0.042, respectively), and DLCO percent predicted (0.013 and −0.130, respectively) were low. These data suggest that the correlation between change in FACIT-Dyspnea and FACIT-Dyspnea functional limitations, and change in mRSS, was not attributable to the improvement in SSc lung disease as assessed by change in FVC and DLCO percent predicted.
There were no significant differences in mean PRO change scores between modified Medsger Severity Scale change groups. This is likely attributable to the majority of patients having stable disease severity as defined by the modified Medsger Scale during 1 year of followup.
The major study limitation was the absence of patients who greatly improved or worsened as assessed by the modified Medsger Severity Scale. This led to nonstatistically significant associations between changes in the modified Medsger Severity scores and mRSS, and the PROMIS PRO measures. This could be addressed by larger sample sizes and/or longer duration of followup in future studies. Alternatively, additional items could be included in the modified Medsger Severity Scale; namely, tendon friction rub counts and novel echocardiogram variables that assess right ventricular size and function.
Study strengths include the wide breadth of clinical information available for the study cohort that enabled determination of etiologies for anemia, PFT changes, creatinine, and other clinical data, and the novel use of the modified Medsger Severity Scale administered at 2 timepoints as a surrogate of disease activity.
PROMIS-29 and FACIT-Dyspnea were comparable in performance to traditional questionnaires with the exception of assessing physical function change associated with improvement in mRSS. The SF-36 PCS was responsive to this change whereas the 4-item PROMIS physical function scale in the PROMIS-29 was not. With enhanced assessment of physical function, the PROMIS-29 and FACIT-Dyspnea may be useful in future SSc clinical research for measuring change in general health and dyspnea because they are available free of charge and easily administered on paper or electronically. The sensitivity of PROMIS instruments may be enhanced when they are administered as CAT. PROMIS-sponsored instruments are also validated in many diseases, including SSc as demonstrated here, and available in Spanish and a growing number of languages, including Portuguese, Mandarin, Dutch, Hebrew, and Hindi. Further, they use a uniform standardized scoring system that is simple to interpret.
The ultimate goal of the PROMIS initiative is to facilitate the use of PRO instruments in clinical practice, as well as in research. Our results demonstrate that PROMIS-29 and FACIT-Dyspnea may be useful for measuring changes in SSc disease severity over 1 year. Future studies should be conducted to determine the responsiveness of PROMIS-43, -57, or -CAT to changes in SSc disease severity compared to the traditional instruments.
Acknowledgments
We thank Andrew Varga for performing the manual chart review to determine the number of patients meeting new ACR/EULAR SSc criteria.
Footnotes
-
Supported in part by the National Institutes of Health (NIH) K12 HD055884 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development and K23 AR059763 (MH), by a research award from the Scleroderma Foundation and the Scleroderma Research Foundation (MH), Eleanor Wood Prince Grant Initiative from the Women’s Board of Northwestern Memorial Hospital (KT), NIH-National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) P60 AR064464 (RWC), NIH-NIAMS R01 AR042309 (JV, MH), and by 5 U54 AR057951 (PROMIS Statistical Center, District of Columbia, USA).
- Accepted for publication September 3, 2014.