Abstract
Objective. Preference-based measures, such as the Short Form-6D (SF-6D), allow quality-adjusted life-years, used in cost-utility evaluations, to be calculated. We investigated the construct and criterion validity of the SF-6D in patients with systemic lupus erythematosus (SLE).
Methods. Female patients with SLE were recruited from outpatient clinics at 2 timepoints, 5 years apart. Cross-sectional correlation of the SF-6D with domains of the disease-specific LupusQol health-related quality of life (HRQOL) measure, the Systemic Lupus International Collaborating Clinics Damage Index (SDI; for damage) and Systemic Lupus Erythematosus Disease Activity Index (SLEDAI; for activity) measures, and patient characteristics was tested. The ability to discriminate between groups defined by smoking status, presence/absence of carotid plaque, depression, and fatigue was tested using the t-test.
Results. In total 181 patients were recruited at baseline. The SF-6D correlated moderately to strongly with all domains of the LupusQoL (0.6–0.8) apart from intimate relationships (0.42) and body image (0.34). Correlations of the SF-6D with the demographic and disease-specific measures at baseline were small for the SDI score (−0.23) and age (−0.19) and in the expected direction. The SF-6D did not correlate with disease activity (SLEDAI −0.08). The SF-6D could distinguish those who smoked, had carotid plaque, had depression, and reported fatigue from those who did not, with the largest effect size being for depression (0.75).
Conclusion. The SF-6D displays construct and criterion validity for use in patients with SLE, but the low correlation with aspects of intimate relationships and body image represents a concern and reinforces the need to collect disease-specific measures of HRQOL alongside generic preference-based instruments.
In the past 3 decades, despite few significant advances in the treatment of patients with systemic lupus erythematosus (SLE), longterm survival of these patients has improved1. As mortality improves, attention has focused on longterm morbidity associated with SLE and more recently, quality of life of patients with SLE and patient-reported outcomes (PRO) have been extensively studied2. Many studies have shown that health-related quality of life (HRQOL) in patients with SLE is poor, even compared with other rheumatic diseases3, and poor HRQOL persists over longterm followup2,4.
Against this backdrop a number of novel therapies are being developed for SLE. These include a range of biological therapies such as B cell depletion and modulators, type I interferon inhibitors, and interleukin 6 blockade5. To date, only belimumab has been licensed by the US Food and Drug Administration6; however, other novel agents such as rituximab are being used off-license5. There will therefore be an increasing need for new products to be evaluated for economic as well as clinical approval in the coming years.
Cost-effectiveness analysis is a tool used by policy makers to make decisions between competing treatments aimed at maximizing the health benefit of the population against the constraint of a finite budget. Therefore, the cost-effectiveness, or the investment required for a unit of health-gain of new treatments for SLE, will have to be compared with other technologies within SLE and across other diseases. Cost-utility analysis is one type of cost-effectiveness analysis, where the unit of health-gain is measured in quality-adjusted life-years (QALY). This has been adopted by organizations recommending treatments in a number of countries including the United Kingdom and Canada. QALY are the product of time in a health state and the utility (a valuation of health status scaled relative to perfect health and death) of that health state. A number of generic self-administered health status questionnaires can be mapped to precollected preference-based societal utility estimates (derived from choice-based methods such as the time tradeoff or standard gamble7). These preference-based instruments include the EuroQol-5D (EQ-5D)8,9, the Health Utilities Index measures10, and recently the Short Form-6D (SF-6D), which can be estimated from the Medical Outcomes Study Short Form-36 (SF-36)11 or SF-1212. While these preference-based instruments allow QALY used in cost-utility evaluations to be calculated, they should be evaluated for validity in each setting in which they are applied.
Disease-specific measures such as the LupusQoL© also exist to provide a comprehensive overview of the HRQOL of patients with SLE13, and as they are designed specifically for an SLE population, may be able to avoid inclusion of irrelevant items while allowing assessment of more specific aspects of HRQOL that generic preference-based instruments such as the EQ-5D and SF-36/SF-6D may not measure. However, these disease-specific measures are not appropriate for cost-utility analysis as they do not provide a single value on a cardinal scale of a patient’s health, and more importantly for policy makers do not provide a preference-based utility estimate on a scale that can be directly compared with other interventions in different therapeutic areas.
The SF-6D is a relatively recent addition to the preference-based instruments. The utility weights for the SF-6D were developed to reflect UK societal preferences using standard gamble techniques11. The SF-6D has considerable advantages over the EQ-5D, including greater descriptive ability than the more commonly used EQ-5D, and the ability to retrospectively calculate utility estimates in studies where the SF-36 or SF-12 were collected14. The SF-6D, if calculated from the SF-36, uses 11 questions to describe 6 domains: physical functioning, role limitation, social functioning, bodily pain, mental health, and vitality. The health profile described by these 6 domains is then attributed a utility value scaled between 1 (equivalent to perfect health) and 0.30, where 0 would be equivalent to death. If valid for SLE, the SF-6D would be a useful extension to the commonly used, validated15,16, and recommended17 SF-36 measure to allow QALY to be calculated for cost-effectiveness analyses. The validity of the SF-12, a shortened version of the SF-36, in SLE has not been formally established, although the measure has been applied to SLE populations18. Only a few studies in SLE have to date applied the SF-6D in SLE patient populations19,20,21. Although these studies report findings suggestive of the validity of the SF-6D for patients with SLE, such validity, particularly in the UK setting, requires further investigation. As part of an ongoing study of carotid atherosclerosis in SLE we have collected clinical data, HRQOL data, preference-based instrument data, and socioeconomic data in our cohort22. The purpose of our study was therefore to examine the construct validity of the SF-6D in patients with SLE.
MATERIALS AND METHODS
Patients
Female patients with SLE were recruited from outpatient clinics in Greater Manchester and the North-West of England at 2 timepoints 5 years apart. The patients were over 18 years of age and fulfilled 4 or more American College of Rheumatology (ACR) criteria (updated 1997) for SLE23. Patients who fulfilled 3 criteria for SLE in the absence of any alternative diagnosis were also included, as described22. The recruitment was restricted to white British patients as the original study design was also for the prospective collection of DNA for genetic studies. Patients eligible for inclusion to the study had to be receiving stable therapy for at least 2 months. Patients who were pregnant or lactating within 6 months were excluded. All patients gave written informed consent and the study was approved by the Central Manchester Local Research Ethics Committee.
Data collection
At baseline and 5 years’ patients underwent a clinical interview and examination to collect demographic information, family history, and lifestyle factors.
Patients had a clinical assessment that included assessment of disease activity using the Systemic Lupus Erythematosus Disease Activity Index (2000 version; SLEDAI-2K)24 and cumulative damage using the Systemic Lupus International Collaborating Clinics (SLICC)/ACR Damage Index (SDI)25. The SLEDAI-2K is a physician-rated index of 24 descriptions of disease activity, each with a weighting from 1 to 8 depending on severity; the score ranges from 0 (no activity) to 105 (maximum activity)24. The SDI reports disease damage present for 6 months or longer in 12 organs/systems and can range from 0 (no damage) to 46 (maximum damage)25.
We also assessed patients for traditional coronary risk factors, anthropometric measures, and history of previous arterial events (i.e., myocardial infarction, angina, stroke, transient ischemic attacks, or peripheral vascular disease) as described22. Depression was defined by patient self-report and/or whether the patient was taking antidepressant medication, as reported26. In the case of low-dose amitriptyline, which is frequently prescribed for fibromyalgia, this information was recorded and subsequent analyses were undertaken including and excluding this medication26. We did not perform a formal tender point count in this population.
Patients also completed the generic RAND Medical Outcome Study 36-Item Short-Form Survey version 1 (MOS SF-36)27 at baseline and 5 years. The SF-6D is calculated by applying an algorithm to the patient responses to 11 questions of the SF-36 to create the 6 domains of the SF-6D, which are then converted to health profiles that have been valued by a representative sample of the UK population and using the preference-based standard gamble technique11. The best possible score on the SF-6D is 1 (equivalent to full health) and the worst possible score is 0.30, 30% of perfect health.
The disease-specific LupusQoL13 was completed at the 5-year timepoint. The LupusQoL consists of 34 items assessed by a 5-point Likert scale in 8 domains: physical health, pain, planning, intimate relationships, burden to others, emotional health, body image, and fatigue13. The LupusQoL is scored for each domain as the mean domain score, which is then transformed by dividing by 4 (the number of Likert responses minus 1) and then multiplying by 100; these transformed scores range from 0 (worst HRQOL) to 100 (best HRQOL).
Carotid atherosclerosis was defined using carotid ultrasonography as described28. Carotid plaque was defined if 2 of the following 3 conditions were met: (1) a distinct area of protrusion into the vessel lumen > 50% compared with the surrounding area; (2) increased echogenicity compared with the adjacent boundaries; and (3) intima-media thickness > 0.15 cm29.
Analysis
The data were analyzed cross-sectionally at baseline, and for investigation of the associations of the SF-6D with the LupusQoL, at an assessment 5 years later.
The construct/criterion validity of the SF-6D was assessed using convergent validity and discriminant (known-groups) validity. Criterion validity, according to the definition of OMERACT (Outcome Measures in Rheumatology), specifically reflects the sensitivity and specificity of the measure against a “gold standard.” However, often no gold standard exists, so tests of convergent and discriminant validity are performed. Convergent validity was based on the cross-sectional correlation (Spearman or Pearson correlation coefficients) of the SF-6D with the domains of the LupusQoL (a validated disease-specific measure of HRQOL in patients with SLE), the disease-specific SDI (for damage), and the SLEDAI (for activity) measures and patient characteristics (age, disease duration, education, smoking, depression, fatigue) collected at a baseline assessment, and LupusQoL using data collected at a 5-year assessment. Strength of correlation was classified using thresholds of ≥ 0.2 for weak correlation, ≥ 0.4 for moderate, ≥ 0.6 for strong, and ≥ 0.8 for very strong30. The directions of correlations were expected a priori to be positive (SF-6D increases with increasing values) for the domains of the LupusQoL and levels of education, and negative for disease damage and activity, age, disease duration, smoking, depression, and fatigue. Similar approaches to validation have been applied in other settings, for example, in rheumatoid arthritis31. Moderate to strong correlations were expected with the LupusQol, which is the closest measure we have to a gold standard of HRQOL in this patient group, and increasingly moderate to low correlations between the SF-6D and measures of disease activity, damage, and demographics. The discriminant validity of the SF-6D (ability to discriminate between known groups) was tested using ordinary least-squares regression. Effect sizes (Cohen’s D) were calculated to quantify the magnitude of the difference in SD units by dividing the mean difference in SF-6D by the standard deviation for both groups combined30. A yardstick for interpreting effect sizes suggests that an effect size of 0.2 is small, 0.5 is moderate, and 0.7 is large30. Patients were divided into groups expected to differ in health status defined by smoking status and presence/absence of each of carotid plaque, depression, or fatigue26. Our a priori hypothesis in testing these was that patients who smoked or had carotid plaque, depression, or fatigue will have lower SF-6D scores.
Potential to detect change was explored by examining for floor and ceiling effects. Floor effects are said to exist if large numbers of respondents occupy the worst possible health state of a measure, and ceiling effects if large numbers of respondents occupy the best possible health states. If floor effects exist, the ability of a measure to detect any further worsening in health is inhibited, and in contrast ceiling effects limit the ability to detect further improvement32,33. One rule for identifying the existence of floor and ceiling effects considers > 15% of respondents at the floor/ceiling to be serious, and for effects of 1% to 15% at the floor or ceiling to be small33. The overall preference-based score of the SF-6D and the 6 domains of the SF-6D with 4–6 levels were also tested for floor and ceiling effects at both timepoints, as described for other musculoskeletal diseases32,34. Where floor/ceiling effects were found to be serious at either timepoint, comparisons were made with responses to the LupusQoL using data from the 5-year timepoint to allow comparison of responses to similar domains.
RESULTS
In total, 181 female patients were recruited into the study; the mean age and disease duration at baseline were 48 years (SD 10) and 11 years (SD 9), respectively (Table 1). The mean SF-6D score at baseline was 0.60 (SD 0.12); the median SDI score was 1 [interquartile range (IQR) 0, 4], and the median SLEDAI score was 1 (IQR 0, 4). The baseline characteristics of the patients included in the second cross-section were comparable, having modestly older age at baseline (49 years, SD 9), longer disease duration (13 years, SD 10), and higher SF-6D scores (0.62, SD 0.11). There were no differences in SDI or SLEDAI scores.
Construct and criterion validity
The SF-6D was positively correlated with all domains of the LupusQoL (Table 2). The correlation between the SF-6D and the domains of the LupusQoL was moderate to strong (0.6–0.8) for all domains; however, the correlations for intimate relations (0.42) and body image (0.34) were weakly correlated with the SF-6D (Figure 1). The correlations of the SF-6D with the demographic and disease-specific measures at baseline were small to medium for the SDI score (−0.23) and age (−0.19) and in the expected direction. The SF-6D did not correlate with any of the other variables specified including disease activity (SLEDAI −0.08), although all observed associations were in the direction expected.
Discriminant validity
The mean SF-6D scores were lower for patients who were current smokers than for those who were never or ex-smokers (SF-6D: B coefficient −0.07, p = 0.003). In addition they were lower in those with carotid plaque (SF-6D: B coefficient −0.05, p = 0.027) and with depression (SF-6D: B coefficient −0.09, p < 0.001), and in those who reported a history of fatigue (SF-6D: B coefficient −0.06, p = 0.006). The effect size (ES) of the difference in means suggested that the differences were small for carotid plaque (ES = 0.33), moderate for smoking status and fatigue (ES = 0.58 and ES = 0.50, respectively), and large for depression (ES = 0.75 for both definitions; Table 3). Therefore the SF-6D was able to differentiate between groups of patients expected to differ in health status.
Floor and ceiling effects
There were no floor or ceiling effects for the overall SF-6D preference-based scores; 1 patient scored at the lowest level (floor) of the SF-6D and no patient scored the highest level (ceiling; Table 4). Serious floor effects were, however, found to exist for the individual subscales of the vitality and role-limitation domains of the SF-6D and serious ceiling effects for the mental health and role-limitation domains (Table 4). In comparable domains, patients at the ceiling of the SF-6D mental health domain had a high median LupusQoL emotional health score of 92 (IQR 88, 100) and patients at the floor of the SF-6D vitality domain had a low median LupusQoL fatigue score of 28 (IQR 16, 41; Table 5). Role limitation has a broader definition covering physical and emotional limitations in daily activities (work or other activities). Those at the floor of the SF-6D role-limitation domain had low LupusQoL scores for intimate relationships (median 13; IQR 0, 50) and fatigue (median 38; IQR 19, 50); however, the remainder of LupusQoL domain scores were around the midpoint of the scale. In contrast, those at the ceiling of the SF-6D role-limitation domain had median scores ≥ 75 for 7 of the 8 subscales [only body image (median 58) was lower]. These results suggest that the comparable LupusQoL domain scores of patients at the ceiling of the SF-6D mental health and role-limitation domains are correspondingly higher than the scores for those at the floor of the SF-6D role-limitation domain are low.
DISCUSSION
In our study of British white women with SLE, we measured health status using the SF-6D and examined its metric properties. In particular, we sought for the first time to evaluate the SF-6D alongside a disease-specific HRQOL measure (the LupusQoL) as recommended35. Our study confirms results from recent reports from the United States that the SF-6D is a valid measure for use in SLE. Aggarwal, et al found that the SF-6D was able to differentiate between patients of differing disease severity even though the measure demonstrated a weak correlation with disease activity and damage20. Similarly, results from the LUMINA cohort found that the SF-6D predicted future damage accrual (but not mortality)19, and was associated with levels of social support, age, poverty, disease activity and damage, fatigue, and helplessness21.
Our subjects, all white British women, had disease characteristics, including disease duration, therapy used, and overall levels of damage, that were similar to other SLE cohorts in the literature4. Within this cohort, we studied baseline characteristics likely to influence quality of life and found that the SF-6D was, as predicted, lower in patients who smoked as well as in those with self-reported chronic fatigue or depression. We previously noted that mental component scores (MCS) of the SF-36 were lower in patients with subclinical atherosclerosis, an association that seemed to be mediated by lower MCS scores in smokers26. Again in this analysis SF-6D was lower in those with carotid plaque and had a negative correlation with carotid intima-media thickness. Similarly, the SF-6D also showed correlations in the expected direction with age, disease duration, and length of time in education. Others have noted that education level and age are important determinants of HRQOL21 and our study is in agreement with this. In keeping with findings from Aggarwal, et al20, we also noted that SF-6D had a poor correlation with inflammatory disease activity, which, along with comparable data using the SF-36, confirms that these scales are assessing a different dimension of the disease4,36,37. The correlation of SF-6D with SDI damage was also weak (r = −0.227), again supporting the need to identify health status as a separate domain in SLE disease assessment23.
At the followup visit we were also able to assess how the SF-6D correlated with the LupusQol, a disease-specific HRQOL measure13. The moderate to strong correlation of the SF-6D with the domains of the LupusQol is important, as this instrument was developed from patient interviews that identified areas of HRQOL directly relevant to patients with SLE13. Of the 8 domains, 6 had strong correlations with the SF-6D and 2 had moderate correlations. The 2 with moderate correlations were intimate relations and body image, which are novel domains identified in the derivation of the instrument13. This will have implications when evaluating treatments for SLE that would influence these aspects of disease. Importantly, this may include therapies aimed at minimizing steroid treatment, with associated weight gain, Cushingoid features, etc., or treatments that improve facial rashes, alopecia, and cutaneous scarring, etc. Therefore, although the SF-6D may be valid for use in SLE, it is important that a disease-specific measure be collected alongside generic preference-based instruments in clinical trials, as recommended by OMERACT35.
The finding of a lack of floor or ceiling effect for the SF-6D preference-based scores is encouraging and suggests the potential of the instrument to measure change over time. However, within domains of the SF-6D, serious floor and ceiling effects were detected. While patients scoring at the ceiling of the role-limitation and mental health domains of the SF-6D also tended to report very good health on comparable domains of the LupusQol, the patients at the floor of the SF-6D reported higher scores on LupusQoL domains than might be expected, particularly for the role-limitation domain. The most severe level of the role-limitation domain of the SF-6D is described as “you are limited in the kind of work or other activities as a result of your physical health and accomplish less than you would like as a result of emotional problems”11. This appears to be less severe than descriptions of the most severe level in other preference-based measures (for example, the EQ-5D “Usual activities” domain has “I am unable to perform my usual activities” as the descriptor of the most severe level38). This general emphasis on the milder states is reflected in the range of the measure: the lowest score possible reflects one-third of full health. This finding suggests that, as in other conditions such as rheumatoid arthritis14, the SF-6D may be more useful in patients with mild rather than those with more severe SLE.
Our study has some limitations that must be considered. Our population included only white British women; however, other results from this population, including the prevalence of carotid plaque and overall SF-36 results, are comparable to other cohorts4,39. Nevertheless we will also need to examine this scale in a more diverse population and include male patients as well as subjects from other ethnic backgrounds to understand the scale more completely. We were also not able to examine responsiveness to change. Our population was limited to having 2 assessments 5 years apart, which is too long for anchor-based evaluations, e.g., the question of the SF-36 commonly used to assess self-report change in health is framed over a 1-year period. Our followup does not match this. Additional studies will be needed to assess this quality of the measure.
We have found that the SF-6D is a valid generic preference-based instrument for use in patients with SLE; the measure reflects a number of key outcomes of the disease and has good discriminant validity. It also correlates well with 6 key domains of the disease-specific LupusQoL, but there was a lower correlation with the aspects of intimate relationships and body image that represents a concern and that reinforces the need for collection of disease-specific measures of HRQOL alongside generic preference-based instruments.
Acknowledgment
We acknowledge the clinicians of North-West England who contributed patients to our study, including Dr. Jeff Marks, Dr. Mike Venning, Dr. Paul Sanders, Dr. Sue Knight, and Dr. Martin Pattrick.
Footnotes
-
Supported by Arthritis Research UK, Lupus UK, and the Charitable Funds of Central Manchester Foundation Trust. Prof. Bruce is supported by the Manchester Academic Health Sciences Centre, the Manchester NIHR Biomedical Research Centre, and The Manchester Wellcome Trust Clinical Research Facility.
- Accepted for publication November 17, 2011.