Abstract
Objective. Validated gastrointestinal (GI) symptoms scales are used in clinical practice to assess patient-reported GI involvement. We sought to determine whether University of California, Los Angeles (UCLA) GI Tract Questionnaire (GIT) 2.0 Reflux scale, Patient-Reported Outcomes Measurement Information System (PROMIS) Reflux scale, and the Quality of Life in Reflux and Dyspepsia questionnaire (QOLRAD) are sensitive to identifying changes in GI symptoms following therapeutic intervention in participants with systemic sclerosis (SSc) and gastroesophageal reflux disease (GERD).
Methods. Participants with active GERD were recruited during clinical visits at 6 international SSc centers. Patient-reported outcome surveys and the GI self-reported questionnaire were completed at baseline and again at 4 weeks following a single intervention, and patients were classified as “improved” or “not improved.” Effect size (ES) was calculated to assess the sensitivity to change. ES was interpreted as 0.50–0.79 as moderate effect and ≥ 0.80 as large effect.
Results. There were 116 participants with SSc and active GERD who enrolled. The average age was 53.8 years and mean disease duration was 12.0 years. The UCLA GIT 2.0 Reflux scale and PROMIS Reflux scale had a significant correlation at baseline (0.61, p < 0.0001), and both instruments correlated with the QOLRAD domains (−0.56 to −0.71). In participants who had the UCLA GIT 2.0, PROMIS Reflux scale, and QOLRAD administered over 2 timepoints (n = 57) and were classified as improved, the ES was large for the UCLA GIT 2.0 and PROMIS Reflux scale, and moderate to large across all QOLRAD domains.
Conclusion. The UCLA GIT 2.0 Reflux scale, PROMIS Reflux scale, and QOLRAD are sensitive to change and can be included in future clinical trials.
- SYSTEMIC SCLEROSIS
- SCLERODERMA
- OUTCOME ASSESSMENT
- GASTROESOPHAGEAL REFLUX DISEASE
- GASTROINTESTINAL TRACT
Systemic sclerosis (SSc) is a connective tissue disease characterized by autoimmunity, progressive vasculopathy, and fibrosis of the skin and internal organs1. The most commonly affected internal organ in SSc is the gastrointestinal (GI) tract, which is involved in about 90% of patients. Any part of the GI tract from the esophagus to the anorectum may be involved, resulting in a variety of symptoms including gastroesophageal reflux disease (GERD), gastroparesis, small intestinal bacterial overgrowth, pseudo-obstruction, malabsorption, and fecal incontinence2. The most common of these symptoms is GERD, which is caused by a weakened lower esophageal sphincter with or without loss of esophageal motility3. GERD affects 70–90% of patients2,3, and symptoms include dysphagia, the sensation of acid reflux, or nausea. Chronic uncontrolled GERD may ultimately lead to strictures, erosive esophagitis, ulcerations, Barrett’s esophagus, and/or adenocarcinoma4,5,6. Though a subset of patients is asymptomatic, most patients experience symptoms, which is informative in guiding therapeutic intervention7.
The initial management of suspected GERD includes an empiric trial of acid-suppressive therapy8. If symptoms persist and endoscopy does not reveal evidence of GERD, esophageal function tests can be performed, including esophageal manometry and ambulatory reflux monitoring9. Objective measures are used to make a diagnosis of GERD, to determine whether existing therapy is effective in controlling acid and non-acid reflux, and to assess whether promotility agents may have a role in managing symptoms. Such objective testing includes pH monitoring with or without impedance (to measure reflux), and esophageal manometry with or without high resolution (to measure esophageal motility). While these tests are useful for clinical decision-making, they are invasive, costly, and at times not tolerated by patients with SSc10. As a result, patient-reported outcomes (PRO) have the potential to be more practical and cost-effective outcome measures for well-designed, randomized, placebo-controlled clinical studies and for guiding patient care for GERD management.
The University of California, Los Angeles (UCLA) Scleroderma Clinical Trials Consortium (SCTC) GI Tract Questionnaire (GIT) 2.0 and the US National Institutes of Health (NIH)-funded Patient-Reported Outcomes Measurement Information System (PROMIS) GI symptoms instrument have both been validated in SSc to assess patient-reported GI symptoms. The Quality of Life in Reflux and Dyspepsia (QOLRAD) assessed the effect of GERD symptoms on quality of life in different international cohorts, but has not yet assessed it in SSc11,12. Our objectives were to determine reliability, construct validity, and sensitivity to change of the UCLA GIT 2.0 Reflux scale, PROMIS Reflux scale, and QOLRAD in participants with SSc and active GERD who were given a single new GERD-specific therapy to treat this symptom.
MATERIALS AND METHODS
Participants
Seven international SSc centers from the United States, Italy, Belgium, and Australia participated in this longitudinal observational cohort study: University of Michigan, Johns Hopkins University, University of Utah, University of Florence, Thomas Jefferson University, Ghent University, and Royal Adelaide Hospital. Participants were recruited during routine clinical visits. All participants met SSc American College of Rheumatology/European League Against Rheumatism 2013 criteria and had symptoms of active GERD, defined as having symptoms for at least 3 of the prior 7 days. In addition to the above criteria, all patients were required to meet at least one of the following additional criteria or have one of the following studies: (1) barium swallow showing spontaneous reflux; (2) pH impedance study; (3) abnormal 24-h pH monitoring; (4) upper GI endoscopy showing esophagitis or complications such as ulcers, stenosis, or Barrett’s esophagus; or (5) diagnosis of GERD based on clinical symptoms. To maximize the likelihood of detecting a change in GI symptoms, we focused on recruiting participants who were initiated on a new treatment intervention or had a single change in their GI management (increase in pharmacologic or nonpharmacologic therapies). Such interventions included (1) initiation of lifestyle modifications (avoidance of aggravating foods/acidic foods, not eating < 4 h before bed, raising the head of the bed, avoidance of alcohol and smoking), (2) over-the-counter antacids, (3) initiation or increase of proton pump inhibitors; (4) initiation or increase of H2-blocker, (5) other medications (sulfacrate/carafate), (6) use of a combination of the above medications and prokinetic, or (7) endoscopic antireflux procedures. If the treating physician determined that the patient did not meet criteria for SSc or that the patient’s GI disease was a complication of another condition, then the patient was excluded from the study. Patients could be recruited for only a single GI issue, though other complaints may have existed, and could have only a single intervention made at that clinic visit. PRO were recorded at baseline and again at 4 weeks of followup. This study complies with the Declaration of Helsinki, and the research protocol was approved by the Institutional Review Board (IRB00036572) at each center. Informed consent was obtained from all subjects.
Instruments
We assessed the overall severity of the underlying GI illness with an anchor questionnaire using a single global item [“In the past 7 days, how would you rate your gastrointestinal condition? (excellent, very good, good, fair, or poor)”]. UCLA SCTC GIT 2.0 instrument is a 34-item questionnaire that assesses 7 scales: reflux (8 items), distension/bloating (4 items), soilage (1 item), diarrhea (2 items), social functioning (6 items), emotional well-being (9 items), and constipation (4 items)13,14. Each scale has a weighted subscore, and a total score is based on the sum of 6 domains (reflux, distension/bloating, fecal soilage, diarrhea, social functioning, emotional well-being). A 3-point categorical response scale (0–3) is used to assess all items except for items 15 and 31 in the diarrhea and constipation domains, respectively, where a score of 0 or 1 is provided. A higher score represents more severe disease. For the purpose of this analysis, only the Reflux scale was used, which collects information on reflux and regurgitation; however, the total score does encompass 6 domains.
The PROMIS Reflux scale GI Symptoms Scales instrument is a 60-item questionnaire that assesses 8 scales: gastroesophageal reflux (13 items), disrupted swallowing (7 items), diarrhea (5 items), bowel incontinence/soilage (4 items), nausea and vomiting (4 items), constipation (9 items), belly pain (6 items), and gas/bloating/flatulence (12 items)15. This study used the GERD scale and did not use the computer adaptive training feature, so that all patients answered 20 items related to reflux and dysphagia. A 5-point categorical response scale was used to assess all items, and a higher score denotes more GI symptoms. This instrument does not record social functioning and emotional well-being.
The QOLRAD questionnaire is specific for reflux and dyspepsia and includes 25 items divided into 5 domains including emotional distress, sleep disturbance, food/drink problems, physical/social functioning, and vitality. The questions are rated on a 7-point Likert scale, with a lower value representing a more severe effect on daily functioning.
The Gastroesophageal Reflux Disease Questionnaire (GerdQ) is a self-administered 6-item questionnaire that is developed and validated in the diagnosis of GERD. The 6 domains include heartburn, regurgitation, upper abdominal pain, nausea, sleep interference, and medication, and they are scored on a 4-point Likert scale, with higher numbers representing more severe disease. The cutoff of ≥ 8 points has the highest specificity and sensitivity for a diagnosis of GERD16.
Anchors
The GI Anchor followup visit questionnaire was used to assess both overall improvement and improvement specific to GERD symptoms, and was used in a previous study17. All participants were asked 3 questions, which included the following: (1) In the past 7 days how would you rate your gastrointestinal condition? (2) Compared to your last visit, how is your overall gastrointestinal condition at this time? and (3) Compared to your last visit, how is your overall gastrointestinal condition for which you received treatment? For question 1, a 5-point response scale was provided, and response options included Excellent, Very Good, Good, Fair, and Poor. For questions 2 and 3, a 7-point response scale was provided and included the following potential responses: Completely better, Considerably better, Somewhat better, About the same, Somewhat worse, Considerably worse, and Completely worse. Participants who checked “Completely better,” “Considerably better,” or “Somewhat better” on Questions 2 and 3 were characterized as “Improved,” and others were characterized as “Not Improved.”
Statistical analysis
Descriptive statistics are presented as percentages for categorical variables. Student t tests were used to compare the means in normally distributed data. The Wilcoxon Mann-Whitney U test was used to compare the medians of non-normally distributed data. The percentages of respondents scoring the minimum (floor) and maximum (ceiling) possible scores were calculated to evaluate scale score distributions for PROMIS and legacy instruments. For easy interpretability, floor effect is presented as “worst” possible score and ceiling as “best” possible irrespective of the direction of the scale.
The internal consistency reliability was estimated using Cronbach’s alpha coefficient, and a value ≥ 0.70 was considered satisfactory18. Construct validity was assessed by examining correlation coefficients at baseline between 3 PRO scales19. Cohen d effect sizes (ES) across the whole group, improved group, and not improved group were calculated to assess the sensitivity to change among important clinical subsets and interpreted as 0.20–0.49 as small magnitude, 0.50–0.79 as moderate magnitude, and ≥ 0.80 as large magnitude20,21. All analyses were performed at the University of Michigan (VB) using R software version 3.4.2, and p values < 0.05 were considered statistically significant.
RESULTS
A total of 116 participants with SSc and active GERD completed baseline and 113 completed 4-week followup visits. Patient average (± SD) age was 53.8 (± 13.3) years and there was a mean disease duration of 12.0 (± 10.3) years (Table 1). Participants were more likely to be female (81%), and there was a similar distribution of participants with diffuse and limited cutaneous disease (42% and 49%, respectively). Mean body mass index (BMI) was 25.6 (± 5.2) kg/m2. Antinuclear antibodies were present in 95.0% of participants. Anticentromere antibodies were reported in 40.4% of participants, anti-topoisomerase 1 (anti-Scl-70) antibodies in 21.8%, anti-RNA Pol-3 antibodies in 20.2%, anti-Ro antibodies in 12.2%, and anti-RNP in 10%. GI Anchors baseline visit scores represented patient self-reported assessment of their GI condition over the preceding 7 days. In response to this survey, 30.1% of participants rated their GI condition as poor, 40.7% rated their GI status as fair, 20.4% rated their GI status as good, 8.9% rated their GI status as very good, and none rated their GI status as excellent. Of the 63 patients who completed the GerdQ, 78% (49/63) scored above 8 points, which is highly sensitive and specific for GERD16.
At baseline, the mean (± SD) baseline UCLA GIT 2.0 Reflux score was 0.97 (0.63; n = 113) and the mean baseline PROMIS Reflux score was 53.3 (8.0, n = 65). GIT 2.0 Reflux score was suggestive of moderate GERD and the PROMIS Reflux score was 0.3 SD below mean US general population22. The mean QOLRAD scores (n = 110) for each scale were as follows: (1) emotional distress score 5.3 ± 1.5 (n = 110); (2) sleep disturbance score 4.9 ± 1.6; (3) food and drink problems score 4.8 ± 1.5; (4) physical and social function score 5.8 ± 1.2; and (5) vitality score 5.0 ± 1.6. Cronbach’s alpha coefficient, to assess internal consistent reliability, was adequate for all scales, ranging from 0.78 to 0.96. The percentages of participants having a minimum and maximum possible score at baseline on the UCLA GIT 2.0 Reflux scale, the PROMIS Reflux scale, and the 5 QOLRAD scales were then assessed, stratified by limited and diffuse cutaneous disease (Table 2). The percentage of participants with the minimum score on the UCLA GIT 2.0 Reflux scale was 0.88%. The percentage of participants with the minimum score on the PROMIS Reflux scale was 0%. On the QOLRAD, the percentage of participants with the minimum score was 0% percent on all scales, except for sleep disturbance, which was 0.91%. All instruments showed none to low floor and ceiling effect.
We assessed the correlations between the UCLA GIT 2.0 Reflux scale, the PROMIS Reflux scale, and the 5 QOLRAD scales (emotional, sleep, food and drink, physical/social, vitality; Table 3). Importantly, higher scores for the UCLA GIT 2.0 and for the PROMIS represent greater reflux disease, whereas higher scores for the QOLRAD represent better quality of life owing to reflux. The UCLA GIT 2.0 and PROMIS Reflux scales had a significant correlation coefficient at baseline (0.61, p < 0.0001). Of the 5 QOLRAD scales, the PROMIS Reflux scale correlated most strongly with the Food and Drink domain of QOLRAD (−0.66; p < 0.01), and UCLA GIT 2.0 Reflux scale correlated strongly with both the Food and Drink domain and the Sleep disturbance domains of the QOLRAD (−0.70 and −0.71, respectively; p < 0.01).
Sensitivity to change was then assessed across the whole cohort (n = 116), and the ability to detect change over time was significant for the PROMIS Reflux scale, (n = 65, p = 0.026), and two of 5 QOLRAD scales (n = 110, emotional distress, p = 0.034; and vitality, p = 0.047; Supplementary Table 1 and Supplementary Figure 1, available with the online version of this article). The p values were of borderline statistical significance in Sleep Disturbance and Food and Drink domains of the QOLRAD (n = 109, p = 0.055 and p = 0.08, respectively), but not significant in the UCLA GIT 2.0 GERD scale (n = 105, p = 0.11), or Physical/Social Function scale of the QOLRAD (p = 0.504). The effect size for all the scales was < 0.20, except for the PROMIS Reflux scale, which was 0.27.
As a smaller number of subjects completed PROMIS Reflux scale versus other instruments owing to nonavailability of translated PROMIS scale, we assessed sensitivity to change in participants who had completed both PROMIS Reflux scale and GIT 2.0. The effect size was similar for PROMIS and GIT 2.0 Reflux scales (0.28 and 0.24, respectively) for the whole group (Table 4). Among participants who were characterized as “Improved” based on their self-reported assessment, the effect sizes were large for GIT 2.0 Reflux scale, moderate to large for the PROMIS Reflux scale, and moderate to large for QOLRAD, depending on the scale (range 0.573–0.928; Table 4, Figure 1, and Figure 2). For the participants who were characterized as “Not Improved,” the ES estimates were not significant.
DISCUSSION
GI dysmotility is the most frequent internal complication of SSc and has a major effect on morbidity. The esophagus is the most frequently involved region in the SSc GI tract, which can be affected by a weak lower esophageal sphincter, and/or significant dysmotility or aperistalsis. Each of these complications can lower the quality of life23. The mechanisms underlying SSc esophageal dysmotility are poorly understood, and biomarkers of GI disease activity are not defined, making it challenging to assess the effects of existing therapies. Though not all SSc patients with esophageal symptoms have GERD24, and though some SSc patients may experience asymptomatic GERD25, patient-reported outcomes such as the GerdQ are sensitive and specific in GERD diagnosis26. Validated patient-reported outcome measures allow for a standardized assessment of important clinical response measures in SSc and may play a role for informing both clinical practice and trial design.
In an international prospective longitudinal cohort, we evaluated the ability of patient-reported GI outcome measures to detect clinically important changes in SSc after a single lifestyle modification or a GERD-specific change in medical therapy. All instruments were found to be reliable and have construct validity and sensitivity to change when assessing clinical response to therapeutic intervention in symptomatic SSc patients with GERD. In addition, there were minimal floor and ceiling effects of the instruments. The QOLRAD survey used in our study is an important complement to the other reflux-focused patient-reported outcome measures that we used, because it accounts for quality of life in several distinct domains, including emotional distress, sleep disturbance, food and drink, physical and social measures, and vitality. These measures are not assessed by the reflux domains of the UCLA GIT 2.0 and the NIH GI PROMIS, though they represent important external influences on GI symptoms and function.
The correlation coefficients for the UCLA GIT 2.0 Reflux scale demonstrated moderate correlation with the PROMIS Reflux scale (0.60) and the QOLRAD domains. Both the UCLA GIT 2.0 and the PROMIS changed in the same direction, with higher points representing more severe GI disease, while the QOLRAD changed inversely to these scales, with lower points representing more severe GI disease. In another study in SSc, the correlation coefficient between the PROMIS Reflux scale and UCLA GIT 2.0 Reflux scale was 0.77, supporting the current estimates.
For the overall group, the change in instruments’ scores over time was minimal, consistent with data in the larger GI disorders17. During the development of the PROMIS GI Symptoms Scale15, only 23% “improved” in reflux symptoms on self-assessment when a similar design was used. In our current study, among participants who “improved,” all instruments had moderate-to-large ES and were able to distinguish between participants who were “Improved” or “Not Improved,” highlighting the importance of the selected instruments in future studies. In contrast, there was a minimal change or an increase in symptom severity among patients in the “Not Improved” group as determined by the GI Anchor. Scores from the 5 domains of the QOLRAD interestingly identified the vitality score, physical/social function score, and sleep disturbance scores as the most prominent residual problems for the “Not Improved” group. Though some symptomatic improvement in the Food and Drink domain and in the Emotional Distress domain were noted among “Not Improved” participants, the ES and responsiveness were substantially lower compared to the “Improved” groups. This suggests that participants’ assessments of being “improved” or “not improved” are strongly influenced by symptoms outside of the gut that are indirectly influenced by GI dysfunction, and not just GI dysfunction alone, and may support the importance of measuring emotional and physical functioning domains.
Our study had many strengths. To our knowledge, this is the first international study to assess the feasibility and reliability (internal consistency) of collection of validated patient-reported outcome surveys of SSc-related GERD in a prospective fashion. These data provide confidence that the application of these surveys in the clinical setting is feasible, even in the international community, when translations are available. Second, after changes in clinical management, these clinical instruments were sensitive to change after a clinical intervention, demonstrating their usefulness as longitudinal outcome measures. Finally, all instruments showed low floor and ceiling effect, thus minimally affecting sensitivity to change.
Weaknesses of our study include that the PROMIS instrument and GerdQ are not translated into the languages of all participating sites; therefore, data collection related to these specific surveys was limited to English-speaking countries. Thus we were not able to examine the results by subgroups of patients (i.e., limited versus diffuse cutaneous involvement, autoantibodies, or disease duration). We did not explore assessment with objective testing to confirm the diagnosis of GERD in each patient owing to limitations in funding; however, we used an appropriate surrogate measure10. Previous data by Bae, et al showed that the GIT 2.0 Reflux scale had a high sensitivity for upper GI involvement, as assessed by upper endoscopy and manometry27. As in any longitudinal observational cohort, there were missing data. However, our data provide robust estimates.
We prospectively evaluated a large international SSc cohort with symptomatic GERD to assess the sensitivity to change over time of validated PRO measures. We identified that these GI-specific PRO are sensitive to change in SSc. Incorporating such instruments into clinical trials as part of an outcome measure may provide a standardized approach to assess the symptomatic improvement of patients with GERD in this disease. Additionally, longitudinal data collection about the specifics on the effect on the instrument response are potential next steps for this large international cohort.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Footnotes
This research was supported by the Scleroderma Clinical Trials Consortium.
- Accepted for publication August 8, 2018.