Abstract
Objective. There is currently no universally accepted measure of quality of life in ankylosing spondylitis (AS). Our objective was to develop and evaluate a patient-reported outcome measure of quality of life in AS, EASi-QoL.
Methods. We used patient interviews, a literature review, and completion of an individualized measure of AS quality of life during clinic-based and pilot surveys to derive questionnaire content. Classical and modern psychometrics were then used to evaluate the questionnaire using data from a large UK-based postal survey of 1000 patients with AS.
Results. Data analysis from the interviews and clinic-based and postal surveys produced a 57-item self-completed questionnaire. Fifteen items were removed as a result of patient interviews and the pilot survey. In total, 612 (64.0%) patients responded to the main postal survey. After assessment of data quality, confirmatory factor analysis, and Rasch analysis, 20 items were found to contribute to 4 domains of AS-related quality of life: physical function, disease activity, emotional well-being, and social participation. Item-total correlations ranged from 0.66 to 0.84. Cronbach’s alpha and test-retest reliability estimates were 0.88–0.92 and 0.88–0.93, respectively. Confirmed hypothesized correlations with the AS Quality of Life questionnaire, the Bath AS Disease Activity Index, Bath AS Functional Index, SF-36, EQ-5D, and the Hospital Anxiety and Depression Scale were evidence for the construct validity of the EASi-QoL.
Conclusion. The EASi-QoL has good evidence of data quality, internal reliability, test-retest reliability, and content and construct validity, and should be considered for use with patients in routine practice settings and in evaluative studies including clinical trials. Measurement responsiveness and minimal important change are currently being assessed.
Ankylosing spondylitis (AS) is an incurable, inflammatory disease, primarily affecting the pelvis and spine1. It can have a profound influence on health status and quality of life (QOL)2. Consequently, appropriate assessment of disease influence and outcomes of healthcare raise complex issues. The AS Assessment group (ASAS) have defined 5 core assessment domains: functional ability, spinal mobility, pain, spinal stiffness, and global assessment3,4. Although acknowledged as an important concept, QOL was not included due to uncertainty over measurement selection3,4.
QOL comprises physical, social, and psychological issues alongside perceptions of health status, cognition, sexuality, spirituality, and personal productivity5,6,7. Since the initial ASAS recommendation, 2 AS-specific measures of QOL have been published: the ASQoL questionnaire8, a standardized measure, and the Patient Generated Index-AS (PGI-AS)9, an individualized measure.
Comparable levels of reliability and construct validity have been reported for these measures9,10. However, several areas frequently nominated by patients as important aspects of QOL were not included in the ASQoL, raising concerns over content validity9. Moreover the ASQoL’s yes/no response scales may be poorly accepted10, prevent detailed descriptions of health5,11, and may have low levels of responsiveness to small, but important changes12. The PGI-AS has good content validity, but the individualized format may limit the feasibility in clinical trials9.
The objective of our study was to develop and evaluate a new AS-specific measure of QOL based on the views of patients from a large UK-based survey.
MATERIALS AND METHODS
Development of a patient-reported outcome measure (PROM)
A 4-stage strategy was adopted: item development, pretesting, pilot evaluation, and data collection (Figure 1). Different patient samples were employed at all stages (Table 1). All participants were diagnosed with AS13 and were aged over 18 years. Ethical approval was granted by North Staffordshire Local Research Ethics Committee.
Stage 1: Item development
Item development was based on the individual’s subjective experience of the daily effects of AS9,14. Items were elicited from patients through exploratory in-depth interviews9 and completion of the PGI-AS, where patients listed up to 5 of the most important areas of life affected by AS9,15. A literature review identified existing questionnaires that might inform development12.
Following content analysis of interview transcripts and completed PGI-AS questionnaires, verbatim statements reflecting important and common themes were listed16. Related themes were highlighted, grouped together, and organized by conceptual categories by 3 of the authors (KLH, AMG, JCP). Following assessment of completeness, ambiguity, and repetition17, 57 items were included in the initial measure. Some item pairs that were not conceptually distinct were included to explore the patient-preferred format during pretesting and data quality during data collection.
Stage 2: Pretesting — Cognitive debriefing
Consecutive patients attending the outpatient clinics at the Staffordshire Rheumatology Centre (SRC) were invited to participate in stage 2.
Cognitive debriefing interviews, including item rephrasing, verbal probing, and thinking aloud assessed if patients experienced difficulties with any part of the measure17,18,19,20. Patients commented on structure, response format, and missing concepts. Patients self-completed a preselected subset of items and a series of open questions were posed, seeking comments in relation to question stem, response options, and timeframe. The results of 4 interviews were assessed and content revised to address specific problems raised, or key issues highlighted for further evaluation. This process was repeated until new concerns did not arise.
Four clinicians and 2 physiotherapists commented on the face validity and clinical relevance of the measure.
Stage 3: Pilot evaluation
The measure was posted to a random sample of 51 patients identified from the SRC database, to comment on content and structure. Data quality was assessed.
Stage 4: Data collection
The measure was then evaluated in a UK-based postal survey of 1000 patients with AS randomly selected from existing databases of 10 secondary care rheumatology centers21. A questionnaire included the Evaluation of Ankylosing Spondylitis Quality of Life (EASi-QoL), disease-specific measures12, domain-specific, generic health measures22, and 2 health transition items.
The disease-specific measures included the Bath AS Disease Activity Index (BASDAI)23, the Bath AS Functional Index (BASFI)24, and the ASQoL8. The domain-specific Hospital Anxiety and Depression Scale (HADS) assesses emotional well-being25.
Generic measures included the Medical Outcomes Study Short Form-36 (SF-36; version 2)26 and the EuroQoL EQ-5D27. The SF-36 provides a score for 8 domains of health: physical function (PF), role-physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), mental health (MH), and role-emotional (RE). The EuroQoL EQ-5D incorporates utilities or preferences for health states to generate an index score of QOL.
Nonresponders were sent reminders at 2 and 4 weeks. Respondents were sent a second questionnaire 2 weeks after receipt of the baseline questionnaire for purposes of assessing test-retest reliability.
Statistical analysis of Stage 4
Statistical analyses related to data collected in stage 4 and included consideration of data quality, dimensionality, Rasch analysis, and tests of external construct validity17,28,29,30,31,32.
Data quality
Items with missing data over 10%, presence of end-effects (> 80%), excessive (> 40%) or minimal (< 10% aggregated adjacent response options) levels of endorsement, and item-item correlations > 0.7017,31,32 were considered for removal. Where 2 items were not considered conceptually distinct those with poorer data quality were considered for removal.
Confirmatory factor analysis
Four potential domain structures were hypothesized, informed by patient interviews9, AS core domains3,4, and relevant literature2,5,9,33,34,35,36: single domain; 2 domains (PF, QOL); 3 domains (PF, emotional well-being, social participation); and 4 domains (PF, disease activity, emotional well-being, social participation).
Assessment of the extent to which the data fitted these structures was performed using confirmatory factor analysis within the framework of structural equation modeling using maximum likelihood estimation and AMOS 7.0 software37. Statistical significance of parameter estimates was evaluated. Goodness of fit was assessed using the comparative fit index (> 0.90 indicated good fit), the standardized root mean-square residual (< 0.08), root mean-square approximation (< 0.06–0.08), and Akaike’s information criteria (smallest value indicating best fitting model)37,38. Misfit of items to domains was examined using the modification indices of the error covariances, with large values indicating possible overlap between items, and the modification indices relating to the regression weights to ascertain possible misloading of items on a domain.
Rasch analysis
The extent to which the selected domains satisfied the Rasch measurement model was investigated using RUMM 2020 software39. In the Rasch model, the probability of a specified response is determined based on the person’s overall level of ability and the difficulty of the item. Items fitting poorly to this underlying model were considered for removal. Overall fit was assessed by examining item-person interaction statistics. The item-trait interaction chi-squared statistic was calculated; a significant result suggests the ordering of difficulty of items varies across the scale and hence poor fit. Individual item-fit statistics were calculated to see how well individual items fitted the model. Threshold disordering was examined to assess inconsistent or illogical use of response items. This may mean respondents have difficulty discriminating between item response options due to the number of options or because they have similar labels40. Differential item functioning was investigated to test whether both sexes and both age groups (age ≤ 49, ≥ 50 yrs) responded similarly to each item40.
Reliability
Internal consistency was assessed using item-total correlation and Cronbach’s alpha17. Test-retest reliability was assessed for patients indicating no change in AS-specific health at 2 weeks17 by the intraclass correlation coefficient (2,1)41. For group comparisons, levels of reliability > 0.70 have been recommended, and for evaluation of individuals, levels > 0.90 are required17.
Validity
The validity of the EASi-QoL was assessed through comparisons with AS-specific, domain-specific, and generic measures. Hypothesized associations were considered a priori. The convergent validity of related dimensions was assessed by correlation.
It was hypothesized that the EASi-QoL disease activity (DA) domain would have a high level of correlation with the BASDAI (> 0.70); the physical function (PF) domain would have a high correlation with the BASFI (> 0.70); and the emotional well-being domain would have a high correlation with the HADS domain scores (> 0.70).
The 4 domains of the EASi-QoL would have moderate to high levels of correlation with the related domains of the SF-36 in the range 0.50 to 0.70: that is, EASi-QoL DA with SF-36 BP and VT; EASi-QoL PF with SF-36 PF and RP; EASi-QoL emotional well-being with SF-36 MH and RE; and EASi-QoL social participation (SP) with SF-36 SF and GH.
Extreme groups validity
It was hypothesized that patients who reported bothersomeness (SF-36 item 8) or being unemployed or retired due to ill health would also report higher EASi-QoL domain scores, suggesting worse levels of health. Independent t-tests compared the mean domain scores between groups.
Acceptability
Consecutive patients attending the outpatient clinics at the SRC were invited to consider the final version of the EASi-QoL and commented on item relevance and acceptability during face to face interviews. SRC clinicians and physiotherapists with a specialist interest in AS also considered the measure.
This multicenter cross-sectional survey was approved by the North Staffordshire Local Research Ethics Committee and the 10 center-specific NHS Trusts. Written consent was obtained from all patients according to the Declaration of Helsinki.
RESULTS
Stage 1: Item development
The literature review, patient interviews (n = 29), and completion of the PGI-AS (total n = 462)9,10,12 contributed to a 57-item measure (Figure 1) that addressed a range of QOL dimensions: cognitive function, emotional well-being, global well-being, personal constructs, physical function, role activities, social well-being, and symptoms5.
Following considerations of patient acceptability and score precision5,26, a 5-point response scale was selected, scored from 0 to 4. Three response scales deemed most appropriate to the individual items were used: “not limited at all” to “totally limited/unable to do,” “none of the time” to “all of the time,” and “not at all” to “extremely.”
Stage 2: Pretesting
Twenty-seven patients were interviewed (Table 1). The measure was relevant to their experience of AS and included important issues. Fourteen items were removed to reduce repetition and minor modifications made. Respondents considered the response “applies to you today” to be preferable for items relating to physical functioning. For the remaining items a one-week recall period was preferred.
Stage 3: Pilot evaluation
Thirty-six (70.6%) patients responded to the pilot postal survey (Table 1). Thirty-two (89%) respondents completed all items; 4 completed 98% of items, with 4 different items being omitted. Respondents identified only minor problems; 1 item was removed and some wording changes were made. The revised 42-item measure retained the top 20 areas reported as important by AS patients9 and a wide range of QOL concerns.
Stage 4: Data collection
A total of 612 (64.0%) patients returned a completed postal questionnaire; 489 returned questionnaires at 2 weeks (80.2%) (Table 1).
Data quality
All 42 items were completed by 512 patients (84%). The mean number of missing responses was 3.7 (SD 9.2). The majority of items had 4 or fewer missing responses. The largest number of missing responses for one patient was 14. The item relating to the effect of AS on “intimate or sexual relationships” had the most missing data (9.8%).
On the 0–4 response scale, means ranged from 0.61 (sitting — 15 minutes) to 2.18 (pain or discomfort — duration). The item “drinking from a small can or glass” had the highest floor effect of 65.1% (“not limited”). The item “walking one mile” had the highest ceiling effect of 18.6% (“totally limited/unable to do”). Eight items were removed due to item duplication and poor data quality (Table 2).
Confirmatory factor analysis
The 4-domain structure of physical function, disease activity, emotional well-being, and social participation that included the remaining 34 items gave the best model fit, but below recommended levels. Items loaded onto the hypothesized domains; however, the modification indices between some items suggested overlap. Seven items were removed based on the modification indices, their relative strength of loading on a domain, and the earlier data quality assessment (Table 2).
Goodness of fit statistics for the remaining 27 items on 4 domains were satisfactory.
Rasch analysis of domains. Physical function
There was some initial item misfit, with one item (“sitting 2 hours”) showing poor fit, some differential item functioning on one item by sex and age (“bending down”), and threshold disordering for another item (“walking 1 mile”). Model fit was good after removal of the “sitting 2 hours” and “bending down” items, all individual items fitted the model well, and threshold ordering was satisfactory.
Disease activity
Assessment of model fit suggested a slight overall misfit but no specific individual item misfit. There was some slight uniform differential item functioning by sex for 2 items (“energy” and “stiffness”), but given the satisfactory fit all 4 items were retained.
Emotional well-being
Model fit was initially poor and there was a significant item-trait interaction. Removal of 2 items improved the model, although overall fit remained unsatisfactory with 2 further items (“embarrassed” and “down-hearted”) exhibiting poor fit. The negative fit residual (−3.41) for “downhearted” suggested a high level of discrimination and hence possible redundancy of this item, in that patients not downhearted are scoring less than expected and those very downhearted are scoring above expectation. The “embarrassed” item indicated low levels of discrimination. However, given the importance attributed to these items by patients it was decided to retain them.
Social participation
The model fit statistics suggested some misfit for 3 items. Removal of these items improved the model fit.
Confirmatory factor analysis was performed on the final 20 items and showed good fit.
The 20 items retained had low levels of missing data and acceptable evidence of end-effects, as shown in Table 3. Item-total correlations ranged between 0.66 and 0.84, and Cronbach’s alpha ranged between 0.88 and 0.92 (Table 3).
Scoring the EASi-QoL
Scores are computed by summing items (each scored 0–4), where not more than one item per relevant domain is missing. Mean domain scores are imputed for missing items. Lower scores on the EASi-QoL indicate a better AS-related quality of life. The 20-item EASi-QoL is included below (Appendix).
Reliability
The intraclass correlation coefficients (2,1) for those patients indicating no change in their AS-specific health at 2 weeks were all above 0.90, except for disease activity (0.88), the recommended criterion for individual level assessment (Table 3).
Validity
As hypothesized, EASi-QoL domains had high correlations with AS-specific questionnaires measuring related constructs: the disease activity domain with the BASDAI and the PF domain with the BASFI, which were the highest correlations for these two EASi-QoL domains (Table 4).
As hypothesized, the strongest correlations between EASi-QoL and SF-36 were for the domains measuring related constructs: EASi-QoL PF with the SF-36 PF and RP domains; the EASi-QoL DA with the SF-36 BP and VT domains; and EASi-QoL SP correlated strongly with the SF-36 SF and RP domains (Table 4). While high levels of correlation were found between the EASi-QoL emotional well-being and the SF-36 MH and RE domains, slightly higher correlations were found with the SF-36 SF, BP, and RP domains.
As hypothesized, the EASi-QoL emotional well-being domain had the strongest correlation with the HADS. Finally, the correlations with the EQ-5D were all in the range of 0.71–0.76, the largest being for social participation.
Extreme-groups validity
As hypothesized, compared with patients who were bothered by their ill health, those who were not bothered had significantly better levels of health on the EASi-QoL (all p < 0.001). Compared with those unable to work due to ill health, patients in work had significantly better levels of health on the EASi-QoL (p < 0.001; Table 5).
Acceptability
Seven patients self-completed the final EASi-QoL (71.4% were men; mean age 50 yrs, SD 14.2). All patients indicated that the measure was simple to complete and addressed areas of importance to their AS. Self-completion time approximated 5 minutes. The measure was felt to be clinically relevant by clinicians (n = 4) and physiotherapists (n = 2).
DISCUSSION
Well developed patient-reported outcome measures (PROM) provide a major source of evidence of the patient experience of disease impact and healthcare that can inform the decisions of patients, healthcare providers, and policy-makers. The challenge for quality of life assessment is to determine the uniqueness of disease impact to the individual9,14. Development of the EASi-QoL was driven by evidence that factors reported as important by people with AS, including body image, mobility, and employment, were not adequately assessed by existing measures9. Moreover, in their initial recommendations, the ASAS group was unable to recommend QOL as a core assessment domain due to uncertainty over the best measurement approach3,4.
Patients made a substantial contribution to the development of the EASi-QoL. Item content was informed by patient interviews9, a UK-wide survey of AS patients9,15, and piloting and followup interviews. The involvement of more than 550 patients during the development stages ensured that the lived experience of AS was assessed, which promoted the identification of important patient-derived themes. A subsequent large, UK-wide survey of AS patients provided the setting for evaluating essential measurement and practical properties. Responsiveness and minimal important change will be assessed in a followup of patients.
The low level of missing data for the 20-item EASi-QoL is evidence that the measure is acceptable to patients as a self-completed postal questionnaire. Moreover, self-completion takes approximately 5 minutes, which is acceptable for a measure that is to be used in routine practice, or alongside other PROM in clinical trials.
The 20 items contribute to a 4-domain measure of AS-specific QOL: physical function, disease activity, emotional well-being, and social participation. The estimates for internal consistency and test-retest reliability suggests that the EASi-QoL is suitable for applications involving groups of patients, for example, in clinical trials, and on an individual basis, for example, in a routine practice setting.
The results of comparisons with other PROM and AS-specific questions are evidence for the validity of the EASi-Qol. The high level of correlation with AS-specific measures, and the moderate to high correlations with the generic measures are evidence that the EASi-QoL is measuring the effects of AS across different aspects of health. Core sets have recently been described supporting classification of the influence of AS on functioning and health in accord with the World Health Organization’s International Classification of Functioning and Health (WHO ICF)42. The ICF components include body structure, body function, activities and participation, and environmental factors. Future research will assess the relationship between the EASi-QoL domains and this framework. Further, the patients included in our study tended to have established disease; only 7% had symptoms for 5 years or less. The performance of the EASi-QoL in patients with short duration versus long-standing disease should be compared in a future study.
Growing evidence illustrates the significant everyday influence of AS across a range of physical, psychological, and social domains of health2,9,33,34,35. However, pain, stiffness, and the physical effects of AS dominate AS core assessment3,4. Less attention is directed to the emotional and social influence that reflects the broader concepts of QOL. Recently, the PROM Information System (PROMIS) initiative defined 5 constructs within a framework for measuring general health: pain, fatigue, emotional distress, physical function, and social role participation36, all of which are identified within the EASi-QoL domains.
Although a single index score may simplify data analysis, reporting of the 4 domains is recommended as they measure distinct but related constructs. Combining distinct facets of health into an overall score could mask important differential effects of treatment on physical health, disease activity, emotional well-being, or social participation43,44. Moreover, treatment choice and shared decision-making may be better informed by information provided across separate domains. However, index scores may more readily inform distinct treatment choices44 and future research will assess the appropriateness of an index score.
There is considerably less guidance relating to the inclusion of patients’ views about PROM content compared to those relating to quantitative psychometric criteria. Although there is general agreement that the most appropriate and valid PROM have involved patients in their development5,45, this involvement is often cursory and poorly reported45,46. Rarely do developers of PROM work collaboratively with patients as partners in the research process47,48. There is an important need for PROM development generally to more actively embrace patient involvement. The benefit from healthcare interventions may be masked unless their effectiveness is evaluated using outcomes that have relevance to patients and clinicians46. The healthcare community needs to ensure appropriate patient involvement to help identify health domains and associated measures that reveal the experience of living with AS and response to treatment. Moreover, ASAS indicated their recommendations should be revised in light of new evidence for the assessment of QOL: the EASi-QoL, ASQoL, and PGI-AS all involved patients in their development and have evidence for reliability and construct validity. However, the broader content of the EASi-QoL, as reflected in its profile scores, makes it potentially more relevant to patients as a measure of QOL in AS. Concurrent evaluations of the EASi-QoL, ASQoL, and PGI-AS are recommended to further compare measurement and practical properties.
The EASi-QoL measures the influence of AS on QOL from the patient’s perspective across 4 important QOL domains: physical function, disease activity, emotional well-being, and social participation. It is recommended as a new patient-derived measure of AS-specific quality of life that identifies issues of importance to patients.
Acknowledgment
The authors thank all the patients who participated in the study, and consultant rheumatologists, physiotherapists, and research nurses in the EASi-QoL study group — Dr. M. Bukhari (Royal Lancaster Infirmary), Dr. P. Creamer (Southmead Hospital), Prof. H. Gaston (Addenbrookes Hospital), Dr. L. Kay (Freeman Hospital), Dr. S. Linton (Nevill Hall Hospital), Dr. K. McKay (Torbay Hospital), Dr. D. Mulherin (Cannock Chase Hospital), Prof. R. Sturrock (Glasgow Royal Infirmary), and Dr. R. Withrington and Liz van Rossen (Kent and Canterbury Hospital).
APPENDIX
Footnotes
-
Supported by an unrestricted educational research grant from Wyeth UK.
- Accepted for publication May 11, 2010.