Abstract
Objective. The LupusPRO, a disease-targeted patient-reported outcome measure, was developed and validated in US patients with systemic lupus erythematosus (SLE). We report the results of the cross-cultural validation study of the English version of the LupusPRO among patients in Canada with SLE.
Method. The LupusPRO was administered to English-speaking Canadian patients with SLE. Demographic, clinical, and serological characteristics were obtained, and the Medical Outcomes Study Short Form-36 (SF-36) and LupusPRO were administered. Disease activity was ascertained using the Safety of Estrogens in Lupus Erythematosus National Assessment-Systemic Lupus Erythematosus Disease Activity Index (SELENA-SLEDAI) and the Lupus Foundation of America definition of flare (Yes/No). Damage was assessed using the Systemic Lupus International Collaborating Clinics/American College of Rheumatology Damage Index (SDI). Physician disease activity and damage assessments were also ascertained using visual analog scales. A mail-back LupusPRO form was completed within 2–3 days of the index visit. Items tested were internal consistency reliability (ICR), test-retest reliability (TRT), convergent and discriminant validity (against corresponding domains of the SF-36), criterion validity (against disease activity or health status), and known-groups validity.
Results. Participants were 123 Canadian patients with SLE (94% women); mean age was 47.7 (SD 14.8) years. The median (interquartile range) SELENA-SLEDAI and SDI were 4 (6) and 1 (3), respectively. The ICR of the LupusPRO domains ranged from 0.60 to 0.93, while the TRT range was 0.62–0.95. Measures observed were convergent and discriminant validity with corresponding domains of SF-36, criterion validity, and known-groups validity against disease activity, damage, and health status. Confirmatory factor analysis showed a good fit.
Conclusion. The LupusPRO has fair psychometric properties among Canadian patients with SLE, and prospective studies to establish minimally important difference are continuing.
Systemic lupus erythematosus (SLE) is associated with a poor quality of life1,2, and patients with SLE have a poorer quality of life than patients with common chronic diseases3. Considering that the age at onset of SLE is much younger than in most other chronic diseases, and that SLE occurs most often in women, the potential for a cumulative effect of SLE on the patients and their families is immense4. Patient-reported outcomes (PRO) therefore constitute an important facet of overall health outcomes for the management of SLE. The US Food and Drug Administration recommends cross-cultural adaptation and validation of existing PRO tools, to improve their accessibility and applicability to patients and research universally5. The LupusPRO is a disease-targeted PRO measure that was developed and validated in patients with SLE (women and men) of heterogeneous ethnicity within the United States6. It includes both health-related and non-health-related quality of life domains (HRQOL, non-HRQOL), allowing an understanding of the broader burden of the disease. Its clinical utility and research value, compared with other PRO instruments currently available, have already been demonstrated6,7. We report the results of the cross-cultural validation study of the English version of the LupusPRO among Canadian patients with SLE (text of the questionnaire and scoring details available from the author on request).
MATERIALS AND METHODS
LupusPRO
The LupusPRO has 2 constructs: HRQOL and non-HRQOL. The HRQOL domains are SLE symptoms, cognition, SLE medications, physical health (themes: physical function and role physical), pain-vitality (fatigue, sleep), body image, emotional health (emotional function and role emotional), and procreation (sexual health and reproduction). The non-HRQOL domains are desires/goals, relationship/social support, coping, and satisfaction with medical care. In total, the LupusPRO comprises 43 items (30 for HRQOL construct, 13 for non-HRQOL construct) related to the past 4 weeks in the patient’s life. Each item has 5 options ranging from “none of the time” to “all of the time.” The survey takes between 5 and 7 minutes to complete. Individual domain scores, total HRQOL, and total non-HRQOL scores range from 0 to 100, where higher score signifies better QOL. The tool has excellent psychometric properties in US patients6. Our study was approved by the Institutional Ethics Board of the Montreal General Hospital and written consent was obtained according to the Declaration of Helsinki. The tool was pretested in 5 English Canadian individuals and no language modifications were indicated based on the feedback8.
Patients
The McGill University Health Center SLE cohort enrolls and prospectively follows patients meeting the American College of Rheumatology (ACR) classification criteria for SLE9. The LupusPRO was administered to consenting adult patients (age ≥ 18 yrs) who were able to read and understand English. Participants were consecutive English-speaking outpatients coming for their annual research visit between August 2010 and April 2012. Data on demographic information and clinical and serological characteristics were collected at the baseline visit. The LupusPRO and Medical Outcomes Study Short Form-36 (SF-36)10 were self-administered. Higher scores on the SF-36 denote better health. Disease activity was ascertained using the Safety of Estrogens in Lupus Erythematosus National Assessment–Systemic Lupus Erythematosus Disease Activity Index (SELENA-SLEDAI)11, the SELENA Flare Index (SFI), and the Lupus Foundation of America (LFA) definition of flare (Yes/No)12. Damage was assessed using the Systemic Lupus International Collaborating Clinics/ACR Damage Index (SDI)13. In addition, physician assessments of disease activity and damage were ascertained using visual analog scales (MD activity VAS and MD damage VAS). These VAS scores ranged from 0 to 10, where a higher score indicated worse disease status. Patients were given 2 LupusPRO questionnaires. One had to be filled out at baseline (T1), and another had to be completed (along with a patient-reported change in health status that ranged from −7 to +7) within 2–3 days after baseline (T2) and mailed back to the study site.
Psychometric properties
The psychometric properties studied included reliability and validity. Reliability reflects the extent to which (1) different questions that are assumed to address the same underlying concept are correlated; and (2) the same question yields consistent results at different points in time if health remained unchanged14. The former is referred to as internal consistency reliability (ICR) and the latter as test retest reliability (TRT). Validity is the degree to which the measure reflects what it is supposed to measure rather than something else. There are many different types of validity that can be established using a variety of methods. Construct validity is considered to be an overarching concept that encompasses convergent, discriminant, criterion, and known-groups validity. Evidence for convergent validity is provided if the new instrument scores correlate with other measures of the same construct, and evidence of discriminant validity is established if scale scores do not correlate with measures of unrelated constructs. Criterion validity refers to the assessment of the new instrument against an external reference representing more “objective” results. In known-groups validity, the validity is determined by the degree to which an instrument can demonstrate different scores for groups known to vary on the items being measured14. Sensitivity to change (responsiveness) was not assessed in our study.
Statistical analyses
The ICR and TRT for each domain was evaluated using Cronbach’s alpha coefficient, where alpha > 0.70 is considered acceptable15. TRT was tested by evaluating agreement between the patient responses to each domain at 2 timepoints, 2–3 days apart. Intraclass correlation coefficients were computed using a split half model. Convergent validity was evaluated using Spearman’s correlation coefficient based on the strength of correlation of the LupusPRO with related domains on the SF-36 (physical health domain of LupusPRO against physical function, role physical, and physical component summary score of SF-36; emotional health of LupusPRO against mental health, role emotional, and mental component summary score of SF-36; and pain-vitality of LupusPRO against bodily pain and vitality of the SF-36). However, assessment of the convergent validity for the lupus symptoms domain was performed against disease activity assessments (SELENA-SLEDAI, MD activity VAS, SFI, LFA), because they measure the same concepts. Discriminant validity of LupusPRO domains using correlational analysis against nonrelated domains of the SF-36 was assessed. Criterion validity was judged using correlation between LupusPRO domains and physician-based measures of disease activity and damage (SELENA-SLEDAI descriptors and total scores, MD activity VAS, SFI, LFA flare, SDI descriptors and total scores, MD damage VAS). Correlations were classified as strong (r ≥ 0.5), moderate (0.3 ≤ r < 0.5), weak (0.1 ≤ r < 0.3), or absent (r < 0.1). Known-group validity was judged against flares (SFI or LFA), MD activity VAS, and patient-reported health status (SF-36 item 1). ANOVA, with assumption of unequal variance between groups, was used to compare LupusPRO domain scores stratified by known groups. The conceptual framework (hypothesized item to scale relationships) of the LupusPRO was evaluated using confirmatory factor analysis (CFA) appropriate for categorical data. CFA was conducted with the LupusPRO item responses using a robust weighted least-squares estimator and the Mplus software (version 2)16. The latter uses a multistep method for ordinal outcome variables that analyzes a matrix of polychoric correlations rather than covariances. The goodness-of-fit of the hypothesized item-to-scale relationships (multifactor) was evaluated with the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI). The CFI and TLI are comparative fit indices that quantify the amount of difference between the examined model and the independence model (i.e., a standard comparison model that asserts that none of the components in the model are related), with higher scores indicating larger differences. It is recommended that these 2 indices be 0.9 or greater as evidence of acceptable model fit17. All reported p values are 2-tailed.
RESULTS
One hundred twenty-three Canadian patients with SLE (94% women) participated (Table 1). For TRT assessment, 104 patients returned the T2 questionnaire (84.5% response rate) and data were complete for 103 patients. The mean (SD) age was 47.7 (14.8) years. Sixty percent were white, 23% Asian, and 9% African-Caribbean. Fifty-two percent were currently married and the median years of education was 14 [interquartile range (IQR) 3, minimum 4, maximum 17]. The median (IQR) SELENA-SLEDAI was 4.0 (6.0). Flare, as defined by the SFI and LFA criteria, was present among 19% (21 with mild/moderate and 2 with severe) and 17% of participants at the time of the study. The median (IQR) SDI was 1.0 (3.0). The median (IQR) MD activity VAS and MD damage VAS scores were 0.2 (0.95) and 0.3 (3.0), respectively.
The median scores on the LupusPRO domains, floor and ceiling effects, and missing responses are shown in Table 2. Floor and ceiling effects for the SF-36 domains were as follows: physical function (0%, 26.8%), role physical (25.2%, 43.9%), bodily pain (0.8%, 20.3%), general health (0%, 0.8%), vitality (1.6%, 1.6%), social functioning (0.8%, 34.1%), role emotional (17.1%, 65.9%), and mental health (0%, 2.4%).
The ICR of the LupusPRO domains ranged from 0.60–0.93, while the TRT ranged from 0.62–0.96 for the 103 patients with complete LupusPRO at T2 and from 0.74–0.96 for patients who remained stable on the change in health status score (n = 63). Convergent validity with corresponding domains of SF-36 was observed (Table 3). Discriminant validity was evident from poor correlation between unrelated domains of LupusPRO and SF-36 [e.g., correlation coefficients of (a) lupus medication domain with SF-36 physical function and role physical were 0.18 and 0.16, respectively; and (b) procreation domain with SF-36 bodily pain was −0.06]. Criterion validity against disease activity measures (MD activity VAS, SELENA-SLEDAI, SFI, and LFA flare) was observed for all the HRQOL domains of the LupusPRO. These correlations were modest for the LupusPRO domains of lupus symptoms, physical health, pain-vitality, and body image (Table 3). Concerning damage, modest correlations with LupusPRO domains (lupus symptoms, physical health) were noted, while the other domains had weak correlations (Table 3). Known-groups validity of LupusPRO domains against flares (SFI, LFA), VAS activity MD, and health status (SF-36 item 1) were also noted (Table 4).
Results of confirmatory factor analysis lent empirical support for the conceptual framework of the LupusPRO (Table 5). The model fit for the hypothesized item-to-scale relationships were excellent (CFI = 0.98, TLI = 0.99). In addition, item-to-factor loadings representing the hypothesized item-to-scale relationships were also satisfactory. In general, items loaded > 0.6 with their respective factor.
DISCUSSION
Our study assessed the psychometric properties of the LupusPRO in Canada and supports the reliability and validity of this instrument in this population. The demographic and ethnic distribution of our cohort was representative of the patients seen in SLE clinics throughout Canada18. Because all the forms were checked to ensure their completion before patients were discharged, missing responses were few. We also had a high returned rate, suggesting that the LupusPRO is acceptable.
First, LupusPRO demonstrated good reliability. It is noteworthy that the ICR for LupusPRO domains with 2 or 3 items was lower than ICR for domains with > 3 items. ICR improves with a greater number of items forming the domain. For the coping domain, deletion of the item on spirituality/religious improved the ICR to 0.68, suggesting also some cultural differences in the study group. TRT, defined as giving the same result when an individual is retested while remaining in a clinical steady state, is another critical measurement property for HRQOL instruments that was demonstrated in our study. However, TRT for the desires and goals domain was low in our study. The reason for this is unclear. One hypothesis could be that when patients complete the LupusPRO at home versus at the clinic, they may more carefully consider both the immediate and longer-term effects of SLE on their desires and goals. Patients scored lower on this domain by an average of 13.5 points at T2 as compared to T1. However, the length of time between the 2 test administrations may affect this result. A very short time interval makes the carryover effects due to, for example, memory or practice, more likely, whereas a longer interval increases the probability that a change in status could occur. Studies of TRT for HRQOL instruments have used varying intervals between test administrations. The interval of 2 to 3 days was selected here because it is believed to be a reasonable compromise between recollection bias and unwanted clinical change. In a study comparing TRT at 2 days and 2 weeks, there were no statistically significant differences for the 2 time intervals19.
Domains of the LupusPRO performed well against corresponding domains of the generic PRO tool for HRQOL (SF-36; convergent validity); did not correlate with noncorresponding SF-36 domains (discriminant validity); and correlated with measures of disease activity and/or damage or health status (criterion and known-groups validity). Confirmatory factor analysis showed a good fit. Previous studies have failed to identify a significant relationship between SLE disease activity and patient-reported health status19. In our study, weak to modest correlations were noted between disease activity measures and LupusPRO domains. Poor relationship between PRO and disease activity or damage are well known20,21. This indicates that PRO measures provide uniquely valuable information about the effect of SLE and treatment effectiveness that is not captured by the disease activity or damage indices. It is also possible that the differing timeframes between disease activity (10 days), damage (6 months), and LupusPRO (4 weeks) assessments may contribute to the poor relationship between the 3 measures.
We did find significant ceiling effects with both LupusPRO and SF-36. This likely reflects the apparently well controlled or inactive or mildly active disease that is often encountered in the outpatient clinics, particularly among patients who are willing to participate in noninterventional research. To accurately gauge ceiling and floor effects, the tool would need to be tested in a larger heterogeneous patient group with varied disease activity and manifestations.
Generic tools such as the SF-36 have been more widely used in SLE research than disease-specific tools. The SF-36 has been found to be responsive to changes in disease in some studies, while in others the responsiveness has been to changes in fibromyalgia in patients with SLE and not to changes in the disease22. It has been used in some clinical trials and it is not clear whether sensitivity to change was observed23. We used the SF-36 to evaluate the concurrent validity of the LupusPRO because of its widespread use and acceptability as a multipurpose generic measure of HRQOL in SLE. However, it may not identify all HRQOL domains that are significant to patients with SLE, such as sexual functioning, body image, and sleep.
Disease-specific patient-reported outcome tools provide for inclusion of all pertinent domains, and therefore increased sensitivity24. Patient-reported outcome measures specifically designed for patients with SLE have been developed, and each has its strengths and limitations7. An SLE-specific QOL instrument developed in Singapore was derived from input by physicians and nurse clinicians25. The L-QoL is a needs-based QOL model based on cognitive interviews of patients with SLE26. LupusQoL was derived from mostly white women in the United Kingdom and contains only HRQOL domains27.
The LupusPRO has fair psychometric properties among Canadian patients with SLE. Before the LupusPRO can be recommended for use in clinical trials, prospective studies to establish sensitivity to change and minimally important differences are required.
- Accepted for publication April 22, 2013.