Abstract

Objective. To assess the reliability, validity and sensitivity to change of a Chinese version of the 36-item Short-Form Health Survey (SF-36) in Chinese-speaking patients with rheumatoid arthritis (RA) in Singapore.

Methods. The psychometric properties of the Chinese Hong Kong standard version of the SF-36 were assessed in 401 RA patients. The construct validity of the Chinese SF-36 was assessed by comparison with the American College of Rheumatology (ACR) functional status, a validated Chinese Health Assessment Questionnaire (C-HAQ) and markers of RA activity and severity.

Results. The overall Cronbach's coefficient alpha was 0.921, reflecting excellent internal consistency. The instrument showed reasonable test–retest reliability except in the social functioning (SF) subscale. There was a significant ceiling effect in the role physical (RP), SF and role emotional (RE) subscales and a floor effect in the RP and RE subscales. Physical function (PF) and SF were strongly correlated with C-HAQ and patient's assessment of RA activity [Pearson's correlation coefficient (r) ranging from −0.41 to −0.53] and moderately correlated with ACR functional status (r = −0.35 and −0.3, respectively). Weak correlations were also found between the Chinese SF-36 and markers of RA activity, deformed joint count and radiographic damage. PF and SF were the subscales most responsive to change in quality of life (QOL).

Conclusion. The Chinese SF-36 showed reasonable reliability, criterion validity and responsiveness with limitations in certain subscales. Overall, the physical domains and PF in particular may be the most ideal psychometric measures of QOL in RA.

Despite advances in therapy, rheumatoid arthritis (RA) remains a chronic disabling disease with significant morbidity. Traditional outcome measures of RA activity, disability and mortality using clinical, laboratory and radiological parameters do not correlate well with patients’ functional capacity and general well-being. Increasingly, health status instruments such as the quality of life (QOL) measurements that encompass physical, psychological and emotional aspects of the disease are used to assess the impact of chronic illnesses from the patient's perspective. Instruments such as the 36-item Short-Form Health Survey (SF-36) have been shown to be valid and reliable in measuring QOL in Caucasians but hitherto not in Chinese RA patients [1, 2]. The objective of this study was to assess the validity, reliability and responsiveness of the Chinese SF-36 in a Chinese-speaking RA population and to determine its correlation with RA disease measures, including a previously validated Chinese Health Assessment Questionnaire (C-HAQ). This work will enable the use of the SF-36 in Chinese-speaking RA patients and add to the body of evidence regarding the validity of the SF-36 in measuring QOL in RA.

Method

Altogether 401 Chinese-speaking patients from our clinic who fulfilled the revised 1987 American College of Rheumatology (ACR) criteria for RA [3] were recruited. Patients with cognitive impairment or disabling disease other than RA (such as significant coronary artery disease, terminal illness, malignancy, pulmonary or neurological disorders) were excluded. Common co-morbidities (e.g. hypertension and diabetes mellitus) and complications of RA (e.g. carpal tunnel syndrome and osteoarthritis) were not excluded. Disease duration was defined from onset of disease to study entry. Informed consent was obtained. This study was approved by the institutional review board.

All patients were briefed by the Rheumatology Nurse Educator before administration of the questionnaire. Assessment of RA activity, laboratory tests, ACR functional status [4] and a previously modified and validated C-HAQ [5] were recorded prospectively.

The SF-36 consists of eight subscales: physical functioning (PF), physical and emotional roles (RP and RE), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF) and mental health (MH) [6]. The score of each domain ranges from 0 to 100, with higher values representing better self-perceived QOL. The physical health domains are PF, RP and BP while the mental health domains are SF, RE and MH. VT and GH contain both physical and mental components [6].

The Hong Kong (HK) Chinese standard version of the SF-36 (©QualityMetric Incorporated) has been cross-culturally validated using the original US English SF-36 version [7]. The HK Chinese SF-36 differs from the original SF-36 in that playing golf was replaced by Tai-Chi and distance was expressed in street lengths, metres and kilometres rather than in blocks and miles. The Chinese populations in Singapore and HK came mainly from Southern China and share similar cultural practices and a common language. The standard written Chinese of the SF-36 can be understood in both countries. There are two forms of written Chinese, using complete or simplified characters. We used the complete characters in the questionnaire. Whether the two forms of written Chinese perform equally well in SF-36 translations has never been studied [8]. The Chinese questionnaire was provided by Dr Cindy Lam, University of Hong Kong. National norms of the Singapore Chinese general population have been derived for this version [9].

Statistical analysis

We scored the SF-36 and calculated the physical and mental component summaries (PCS and MCS) according to the manuals [6, 10]. The test–retest reliability was assessed by analysing the intraclass correlation coefficient (ICC) from stable patients who answered the questionnaire twice over a 14-day interval. Internal consistency was assessed using Cronbach's coefficient alpha. Patients were asked to rate ‘whether the Chinese-SF 36 questionnaire was easily understood’ on a 10-point numerical scale. Factor structure was studied with principal component factor extraction followed by varimax rotation. A correlation matrix using Pearson's correlation coefficient was used to study the relationship of the SF-36 and the various clinical and laboratory measures. The disease activity and severity measures include tender, swollen and deformed joint count; duration of morning stiffness and 10 cm visual analogue scales (VAS) for patient's assessments of pain and disease activity and physician's assessment of disease activity. Hand radiographic scores using the modified Larson scoring method [11] was carried out by a radiologist (ITYY) blinded to the management of RA. We used four statistics to assess the responsiveness of the Chinese SF-36: Liang's relative efficacy [12], Liang's standardized response mean (SRM) [13], Kazis’ effect size (ES) [14] and Guyatt's coefficient (GC) [15]. We analysed the data with SPSS Professional Statistics 11.0. Differential item functioning (DIF) was carried out using Zumbo's technique [16]. Rasch model analysis was performed with BIGSTEPS, a program obtained from www.winsteps.com. Statistical significance was defined as P<0.05.

Results

Patient characteristics

The study sample has 346 (86%) females with mean age (±s.d.) 57±10.9 yr (range 26–88 yr). The mean disease duration was 11±13 yr (range 0–46 yr). All were Chinese. One hundred and twenty-one patients (30.2%) had no formal education, 133 (33.2%) received primary schooling, 120 (29.9%) secondary schooling, 25 (6.2%) tertiary education and two (0.5%) unknown. Two hundred and eighty (69.8%) were in full time employment, 14 (3.5%) part-time employment, 18 (4.5%) unable to work because of RA, 21 (5.2%) unemployed and 68 (17.0%) retired. At baseline, the mean C-HAQ score was 0.56±0.66 (range 0–3). The mean tender, swollen and deformed joint counts were 1.74±3.61 (range 0–25), 2.63±3.90 (0–25) and 2.84±4.48 (0–28), respectively. DAS-28 was 3.66±1.40 (0–8.36).

The Chinese SF-36 was easily understood by most of our RA patients, who took an average of 9.7 min to complete it. There were no missing values. At study entry, the mean PF score was 64.49 (s.d.±28.13), RP 45.57 (±40.88), BP 61.08 (±27.52), GH 53.05 (±20.08), VT 52.94 (±17.69), SF 69.17 (±30.36), RE 59.35 (±43.49) and MH 66.65 (±17.21). The mean PCS was 35.54 (±14.08) and MCS 49.09 (±10.22).

Effect of co-morbid conditions on quality of life

In this study, only serious co-morbidities (such as terminal cancer or heart failure) that might interfere with QOL were excluded. The commonest co-morbidities were hypertension (143 patients), osteoporosis (78), diabetes mellitus (38), peptic ulcer disease (35), hyperlipidaemia (33), thyroid disease (25) and cataracts (23). There were 161 patients with no co-morbidities, 118 with one co-morbidity, 86 with two, 22 with three, 11 with four and three with five.

There are two approaches to determine if co-morbidities affect the QOL. First, we can establish if there is a relationship between the numbers of co-morbidities and the QOL using analysis of variance (ANOVA). Second, we can see if there are inherent differences in the way the items are regarded between patients with and without co-morbidities using DIF.

ANOVA suggested that items PF 3, PF 8 and subscales physical functioning and social functioning were correlated with the number of co-morbidities. However, post hoc analysis with Tukey's honestly significant difference test did not support this (data not shown). SF 1 (P = 0.01) and GH 4 (P = 0.02) showed DIF in RA patients with and without co-morbidities.

Internal consistency and factor structure

Overall, Cronbach's alpha was 0.921, while those for the subscales range from 0.711 (MH) to 0.915 (PF) (Table 1).

Table 1.

Internal consistency and test–retest reliability of the Chinese SF-36 (test–retest reliability was assessed in a group of 35 patients)

Floor and ceiling effect
SF-36 subscaleInternal consistency (Cronbach's alpha)Test–retest reliability (intraclass correlation)Number reporting floor values (%)Number reporting ceiling values (%)
Entire questionnaire0.92
Physical functioning0.920.887 (1.75)35 (8.73)
Role physical0.840.82141 (35.16)107 (26.68)
Bodily pain0.910.829 (2.24)80 (19.95)
General health0.770.863 (0.75)2 (0.50)
Vitality0.720.830 (0.00)1 (0.25)
Social functioning0.860.6713 (3.24)137 (34.16)
Role emotional0.870.80116 (28.93)191 (47.63)
Mental health0.710.790 (0.00)14 (3.49)
Floor and ceiling effect
SF-36 subscaleInternal consistency (Cronbach's alpha)Test–retest reliability (intraclass correlation)Number reporting floor values (%)Number reporting ceiling values (%)
Entire questionnaire0.92
Physical functioning0.920.887 (1.75)35 (8.73)
Role physical0.840.82141 (35.16)107 (26.68)
Bodily pain0.910.829 (2.24)80 (19.95)
General health0.770.863 (0.75)2 (0.50)
Vitality0.720.830 (0.00)1 (0.25)
Social functioning0.860.6713 (3.24)137 (34.16)
Role emotional0.870.80116 (28.93)191 (47.63)
Mental health0.710.790 (0.00)14 (3.49)
Table 1.

Internal consistency and test–retest reliability of the Chinese SF-36 (test–retest reliability was assessed in a group of 35 patients)

Floor and ceiling effect
SF-36 subscaleInternal consistency (Cronbach's alpha)Test–retest reliability (intraclass correlation)Number reporting floor values (%)Number reporting ceiling values (%)
Entire questionnaire0.92
Physical functioning0.920.887 (1.75)35 (8.73)
Role physical0.840.82141 (35.16)107 (26.68)
Bodily pain0.910.829 (2.24)80 (19.95)
General health0.770.863 (0.75)2 (0.50)
Vitality0.720.830 (0.00)1 (0.25)
Social functioning0.860.6713 (3.24)137 (34.16)
Role emotional0.870.80116 (28.93)191 (47.63)
Mental health0.710.790 (0.00)14 (3.49)
Floor and ceiling effect
SF-36 subscaleInternal consistency (Cronbach's alpha)Test–retest reliability (intraclass correlation)Number reporting floor values (%)Number reporting ceiling values (%)
Entire questionnaire0.92
Physical functioning0.920.887 (1.75)35 (8.73)
Role physical0.840.82141 (35.16)107 (26.68)
Bodily pain0.910.829 (2.24)80 (19.95)
General health0.770.863 (0.75)2 (0.50)
Vitality0.720.830 (0.00)1 (0.25)
Social functioning0.860.6713 (3.24)137 (34.16)
Role emotional0.870.80116 (28.93)191 (47.63)
Mental health0.710.790 (0.00)14 (3.49)

Six factors were extracted from the SF-36 (Table 2). The SF-36 is usually regarded as having two main factors, mental and physical [17], but the analysis suggests that these two may be further subdivided and that some items do not fall cleanly into either of these. The first factor, which explains 35.4% of the variance, consists of the items that interrogate the physical activities. The PF subscale is faithful to its unidimensional design and the constituents consistently gathered in the same factor [17]. Two RP and two SF items also fall into this factor. Therefore first factor concerns the physical functioning in RA. Factor 2 consists of a mixture of items from RP and RE, suggesting that our RA patients tend to view items from these subscales similarly. It relates to the disease interfering with work and emotional roles. Factors 3 and 5 are made up of items from differing domains. Factor 3 concerns questions about mood while factor 5 considers tiredness and depression. Factor 4 contains mainly physical items and factor 6 is a mixture. Factor 4 deals with pain and general health and factor 6 with the uncertainty of the course of RA and the fear of deteriorating health.

Table 2.

Factor analysis of the Chinese SF-36

Item123456
PF 1Vigorous activities0.61
PF 2Moderate activities0.72
PF 3Lift, carry groceries0.71
PF 4Climb several flights0.76
PF 5Climb one flight0.78
PF 6Bend, kneel0.65
PF 7Walk>1 km0.75
PF 8Walk several blocks0.82
PF 9Walk one block0.79
PF 10Bathe, dress0.54
RP 1Cut down time on work0.65
RP 2Accomplished less0.67
RP 3Limited in kind of work0.450.53
RP 4Difficulty performing work0.440.48
BP 1Intensity of bodily pain0.69
BP 2Extent pain interfered with work0.61
GH 1General health0.62
GH 2Get sick easier0.66
GH 3As healthy as anybody0.46
GH 4Expect health to get worse0.430.48
GH 5Health is excellent0.62
VT 1Full of life0.73
VT 2Energy0.75
VT 3Worn out0.74
VT 4Tired0.77
SF 1Extent of social activities interfered with0.57
SF 2Frequency of social activities interfered with0.59
RE 1Cut down time on work0.78
RE 2Accomplished less0.80
RE 3Did not do work as carefully0.72
MH 1Nervous0.66
MH 2Down in the dumps0.53
MH 3Peaceful0.81
MH 4Blue/sad0.63
MH 5Happy0.80
HT 1Health compared to 1 year ago−0.63
Eigenvalue %35.469.135.784.643.793.26
Cumulative %35.4644.5950.3655.0058.7962.06
Item123456
PF 1Vigorous activities0.61
PF 2Moderate activities0.72
PF 3Lift, carry groceries0.71
PF 4Climb several flights0.76
PF 5Climb one flight0.78
PF 6Bend, kneel0.65
PF 7Walk>1 km0.75
PF 8Walk several blocks0.82
PF 9Walk one block0.79
PF 10Bathe, dress0.54
RP 1Cut down time on work0.65
RP 2Accomplished less0.67
RP 3Limited in kind of work0.450.53
RP 4Difficulty performing work0.440.48
BP 1Intensity of bodily pain0.69
BP 2Extent pain interfered with work0.61
GH 1General health0.62
GH 2Get sick easier0.66
GH 3As healthy as anybody0.46
GH 4Expect health to get worse0.430.48
GH 5Health is excellent0.62
VT 1Full of life0.73
VT 2Energy0.75
VT 3Worn out0.74
VT 4Tired0.77
SF 1Extent of social activities interfered with0.57
SF 2Frequency of social activities interfered with0.59
RE 1Cut down time on work0.78
RE 2Accomplished less0.80
RE 3Did not do work as carefully0.72
MH 1Nervous0.66
MH 2Down in the dumps0.53
MH 3Peaceful0.81
MH 4Blue/sad0.63
MH 5Happy0.80
HT 1Health compared to 1 year ago−0.63
Eigenvalue %35.469.135.784.643.793.26
Cumulative %35.4644.5950.3655.0058.7962.06

Kaiser–Meyer–Olkin measure of sampling adequacy: 0.923.

Bartlett's test of sphericity: significance 0.00.

Table 2.

Factor analysis of the Chinese SF-36

Item123456
PF 1Vigorous activities0.61
PF 2Moderate activities0.72
PF 3Lift, carry groceries0.71
PF 4Climb several flights0.76
PF 5Climb one flight0.78
PF 6Bend, kneel0.65
PF 7Walk>1 km0.75
PF 8Walk several blocks0.82
PF 9Walk one block0.79
PF 10Bathe, dress0.54
RP 1Cut down time on work0.65
RP 2Accomplished less0.67
RP 3Limited in kind of work0.450.53
RP 4Difficulty performing work0.440.48
BP 1Intensity of bodily pain0.69
BP 2Extent pain interfered with work0.61
GH 1General health0.62
GH 2Get sick easier0.66
GH 3As healthy as anybody0.46
GH 4Expect health to get worse0.430.48
GH 5Health is excellent0.62
VT 1Full of life0.73
VT 2Energy0.75
VT 3Worn out0.74
VT 4Tired0.77
SF 1Extent of social activities interfered with0.57
SF 2Frequency of social activities interfered with0.59
RE 1Cut down time on work0.78
RE 2Accomplished less0.80
RE 3Did not do work as carefully0.72
MH 1Nervous0.66
MH 2Down in the dumps0.53
MH 3Peaceful0.81
MH 4Blue/sad0.63
MH 5Happy0.80
HT 1Health compared to 1 year ago−0.63
Eigenvalue %35.469.135.784.643.793.26
Cumulative %35.4644.5950.3655.0058.7962.06
Item123456
PF 1Vigorous activities0.61
PF 2Moderate activities0.72
PF 3Lift, carry groceries0.71
PF 4Climb several flights0.76
PF 5Climb one flight0.78
PF 6Bend, kneel0.65
PF 7Walk>1 km0.75
PF 8Walk several blocks0.82
PF 9Walk one block0.79
PF 10Bathe, dress0.54
RP 1Cut down time on work0.65
RP 2Accomplished less0.67
RP 3Limited in kind of work0.450.53
RP 4Difficulty performing work0.440.48
BP 1Intensity of bodily pain0.69
BP 2Extent pain interfered with work0.61
GH 1General health0.62
GH 2Get sick easier0.66
GH 3As healthy as anybody0.46
GH 4Expect health to get worse0.430.48
GH 5Health is excellent0.62
VT 1Full of life0.73
VT 2Energy0.75
VT 3Worn out0.74
VT 4Tired0.77
SF 1Extent of social activities interfered with0.57
SF 2Frequency of social activities interfered with0.59
RE 1Cut down time on work0.78
RE 2Accomplished less0.80
RE 3Did not do work as carefully0.72
MH 1Nervous0.66
MH 2Down in the dumps0.53
MH 3Peaceful0.81
MH 4Blue/sad0.63
MH 5Happy0.80
HT 1Health compared to 1 year ago−0.63
Eigenvalue %35.469.135.784.643.793.26
Cumulative %35.4644.5950.3655.0058.7962.06

Kaiser–Meyer–Olkin measure of sampling adequacy: 0.923.

Bartlett's test of sphericity: significance 0.00.

Test–retest reliability

The test–retest reliability of 35 RA patients for the Chinese SF-36 was reasonable, with ICC ranging from 0.67 (SF) to 0.88 (PF) (Table 1).

Floor and ceiling effect

The ceiling or floor effect occurs when patients perceive that their condition has improved or deteriorated, respectively, beyond what a scale can measure. A ceiling effect was noted with RP, RE and SF while a floor effect was noted with RP and RE (Table 1). This suggests that RP and RE are particularly unsuitable for assessing patients who score extreme values on both ends of the scales.

Construct validity

Table 3 shows that patient global assessment of disease activity and C-HAQ were strongly correlated with PF and SF (r ranging from −0.41 to −0.53). Moderate correlations were found in the following: ACR functional status with PF and SF (r = −0.35 and −0.3, respectively); C-HAQ with RP and BP (r = −0.38 and −0.36, respectively); patient's assessment of global disease activity with RP, BP, GH and VT (r ranging from −0.31 to −0.35); physician's assessment of global disease activity with BP (r = −0.34); DAS-28 with RP, BP, SF and PCS (r ranging from −0.3 to −0.38); and VAS pain assessment with GH and SF (r = −0.30 and −0.34, respectively). The SF-36 was poorly correlated with morning stiffness, tender, swollen or deformed joint counts and radiographic scores. In general, RE and MH had poor correlations with functional status, disease activity and patient's VAS pain score. As expected, the Chinese SF-36 correlated better with the patients’ rather than physicians’ appraisals of the disease.

Table 3.

Correlation matrix, showing the relationship of the subscales of the Chinese SF-36 with various measures of RA

SF-36 SubscaleMorning stiffnessNumber tender jointsNumber swollen jointsNumber deformed jointsX-ray scoreACR Functional statusPhysician globalPatient globalPatient pain assessment (VAS)HAQDAS-28
PF−0.14−0.17−0.17−0.190.05−0.35−0.21−0.41−0.28−0.53−0.29
RP−0.19−0.21−0.17−0.040.14−0.22−0.25−0.33−0.28−0.38−0.30
BP−0.26−0.29−0.28−0.040.10−0.23−0.34−0.35−0.48−0.36−0.38
GH−0.12−0.17−0.16−0.090.11−0.14−0.27−0.35−0.30−0.27−0.28
VT−0.15−0.17−0.16−0.060.07−0.15−0.20−0.31−0.25−0.23−0.24
SF−0.25−0.22−0.20−0.130.15−0.30−0.25−0.42−0.34−0.46−0.31
RE−0.14−0.15−0.070.000.16−0.18−0.16−0.27−0.27−0.25−0.22
MH−0.14−0.19−0.12−0.050.14−0.09−0.21−0.27−0.21−0.22−0.22
PCS−0.20−0.24−0.24−0.14−0.08−0.33−0.30−0.430.38−0.51−0.37
MCS−0.16−0.16−0.090.000.17−0.17−0.26−0.22−0.18−0.16−0.19
SF-36 SubscaleMorning stiffnessNumber tender jointsNumber swollen jointsNumber deformed jointsX-ray scoreACR Functional statusPhysician globalPatient globalPatient pain assessment (VAS)HAQDAS-28
PF−0.14−0.17−0.17−0.190.05−0.35−0.21−0.41−0.28−0.53−0.29
RP−0.19−0.21−0.17−0.040.14−0.22−0.25−0.33−0.28−0.38−0.30
BP−0.26−0.29−0.28−0.040.10−0.23−0.34−0.35−0.48−0.36−0.38
GH−0.12−0.17−0.16−0.090.11−0.14−0.27−0.35−0.30−0.27−0.28
VT−0.15−0.17−0.16−0.060.07−0.15−0.20−0.31−0.25−0.23−0.24
SF−0.25−0.22−0.20−0.130.15−0.30−0.25−0.42−0.34−0.46−0.31
RE−0.14−0.15−0.070.000.16−0.18−0.16−0.27−0.27−0.25−0.22
MH−0.14−0.19−0.12−0.050.14−0.09−0.21−0.27−0.21−0.22−0.22
PCS−0.20−0.24−0.24−0.14−0.08−0.33−0.30−0.430.38−0.51−0.37
MCS−0.16−0.16−0.090.000.17−0.17−0.26−0.22−0.18−0.16−0.19
Table 3.

Correlation matrix, showing the relationship of the subscales of the Chinese SF-36 with various measures of RA

SF-36 SubscaleMorning stiffnessNumber tender jointsNumber swollen jointsNumber deformed jointsX-ray scoreACR Functional statusPhysician globalPatient globalPatient pain assessment (VAS)HAQDAS-28
PF−0.14−0.17−0.17−0.190.05−0.35−0.21−0.41−0.28−0.53−0.29
RP−0.19−0.21−0.17−0.040.14−0.22−0.25−0.33−0.28−0.38−0.30
BP−0.26−0.29−0.28−0.040.10−0.23−0.34−0.35−0.48−0.36−0.38
GH−0.12−0.17−0.16−0.090.11−0.14−0.27−0.35−0.30−0.27−0.28
VT−0.15−0.17−0.16−0.060.07−0.15−0.20−0.31−0.25−0.23−0.24
SF−0.25−0.22−0.20−0.130.15−0.30−0.25−0.42−0.34−0.46−0.31
RE−0.14−0.15−0.070.000.16−0.18−0.16−0.27−0.27−0.25−0.22
MH−0.14−0.19−0.12−0.050.14−0.09−0.21−0.27−0.21−0.22−0.22
PCS−0.20−0.24−0.24−0.14−0.08−0.33−0.30−0.430.38−0.51−0.37
MCS−0.16−0.16−0.090.000.17−0.17−0.26−0.22−0.18−0.16−0.19
SF-36 SubscaleMorning stiffnessNumber tender jointsNumber swollen jointsNumber deformed jointsX-ray scoreACR Functional statusPhysician globalPatient globalPatient pain assessment (VAS)HAQDAS-28
PF−0.14−0.17−0.17−0.190.05−0.35−0.21−0.41−0.28−0.53−0.29
RP−0.19−0.21−0.17−0.040.14−0.22−0.25−0.33−0.28−0.38−0.30
BP−0.26−0.29−0.28−0.040.10−0.23−0.34−0.35−0.48−0.36−0.38
GH−0.12−0.17−0.16−0.090.11−0.14−0.27−0.35−0.30−0.27−0.28
VT−0.15−0.17−0.16−0.060.07−0.15−0.20−0.31−0.25−0.23−0.24
SF−0.25−0.22−0.20−0.130.15−0.30−0.25−0.42−0.34−0.46−0.31
RE−0.14−0.15−0.070.000.16−0.18−0.16−0.27−0.27−0.25−0.22
MH−0.14−0.19−0.12−0.050.14−0.09−0.21−0.27−0.21−0.22−0.22
PCS−0.20−0.24−0.24−0.14−0.08−0.33−0.30−0.430.38−0.51−0.37
MCS−0.16−0.16−0.090.000.17−0.17−0.26−0.22−0.18−0.16−0.19

In general, the direction and strength of the correlations suggest that the SF-36 does measure the patient-perceived impairments and reflects some but not all objective parameters of RA activity and damage. The SF-36 constructs for pain and impairment of social and physical roles are supported by this analysis.

Rasch model analysis

Rasch analysis shows the difficulty level of the items, expressed as log of the odds ratio (logit), and displays these data on a linear and hierarchical scale. The three most difficult items for the patients are ‘Do you feel full of life?’, ‘Do you have lots of energy?’ and ‘Does your health limit you in vigorous activities?’ The three easiest statements are all from the PF subscale of the SF-36: ‘climbing one flight of stairs’, ‘walking one block’ and ‘bathing, dressing’.

PF1 and PF10 were designed to define the hardest and easiest items of the PF scale [17] and our finding that PF1 is the third hardest item in the whole instrument while PF10 is the easiest supports this assumption. We can rank the PF items in this order of difficulty for Chinese RA patients from the hardest to the easiest: PF 1 (logit measure 0.24), PF 6 (0.08), PF 7 (0.04), PF 2 (−0.02), PF 4 (−0.06), PF 8 (−0.11), PF 3 (−0.11), PF 5 (−0.23), PF 9 (−0.29) and PF 10 (−0.39). Though the PF items span 0.63 logit units, they are not uniformly spaced with regards to difficulty level. For example, item PF 1 is much more difficult than the next item PF 7, which has a similar difficulty level as the next (PF 2), a peculiarity that has been reported [18]. Patients with arthritis tend to have a low score for positively worded items such as VT 1, VT 2, GH 3 and GH 5 [17].

The infit and outfit statistics tell us how closely the observed data fit the Rasch model. The infit and outfit mean squares should ideally range from 0.7 to 1.3 (low values indicate that the fit is too predictable while high values indicate excessive unpredictability). Questions such as status of health compared with a year ago (HT 1) and whether the respondent is happy (MH 5) have high infit and outfit values, suggesting that our patients tend to score them disproportionately more modestly compared with their replies to other questions.

Responsiveness to change in disease state

The test–retest portion of the analysis was used to define the stable disease state. We calculated the s.d of the difference of the self-reported disease activity scores in these patients. We then defined ‘unchanged’ patients as those whose change falls within 2 s.d of this [19]. This may be more logical than using the s.d of the baseline score [20]. Accordingly, 82 patients felt that their disease has improved, 286 that it was unchanged and 33 that it had worsened. The mean interval between the two study visits was 16±4 months.

Table 4 shows the responsiveness of the Chinese SF-36. The most ideal responsive subscale should register the largest responsive statistic for patients who have changed and the smallest for those who remained stable (for Liang's SRM, Kazis’ EF and Guyatt's coefficient). Liang's relative efficacy provides a comparison of the responsiveness of the subscales in relation to the most responsive one. As expected, the various responsiveness statistics provide slightly different rank ordering of the responsiveness of the subscales [21].

Table 4.

Responsiveness of the Chinese SF-36 in rheumatoid arthritis

Status of patient (number)PFRPBPGHVTSFREMHPCSMCS
Liang's standardized responseImproved (86)−0.46−0.48−0.64−0.30−0.49−0.50−0.45−0.32−0.56−0.38
MeanUnchanged (283)−0.04−0.11−0.13−0.11−0.11−0.08−0.11−0.05−0.10−0.09
Worsened (33)0.510.500.350.680.530.390.330.360.600.30
Kazis' effect sizeImproved−0.51−0.59−0.71−0.33−0.60−0.53−0.54−0.34−0.60−0.42
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.570.560.500.740.600.550.410.430.680.45
Guyatt's coefficientImproved−0.54−0.57−0.73−0.34−0.51−0.56−0.57−0.36−0.65−0.41
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.500.530.460.640.430.460.370.340.630.30
Liang's relative efficacyImproved–unchanged0.670.511.000.120.480.610.440.280.800.29
Worsened–unchanged0.390.490.271.000.530.290.250.220.630.20
Status of patient (number)PFRPBPGHVTSFREMHPCSMCS
Liang's standardized responseImproved (86)−0.46−0.48−0.64−0.30−0.49−0.50−0.45−0.32−0.56−0.38
MeanUnchanged (283)−0.04−0.11−0.13−0.11−0.11−0.08−0.11−0.05−0.10−0.09
Worsened (33)0.510.500.350.680.530.390.330.360.600.30
Kazis' effect sizeImproved−0.51−0.59−0.71−0.33−0.60−0.53−0.54−0.34−0.60−0.42
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.570.560.500.740.600.550.410.430.680.45
Guyatt's coefficientImproved−0.54−0.57−0.73−0.34−0.51−0.56−0.57−0.36−0.65−0.41
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.500.530.460.640.430.460.370.340.630.30
Liang's relative efficacyImproved–unchanged0.670.511.000.120.480.610.440.280.800.29
Worsened–unchanged0.390.490.271.000.530.290.250.220.630.20
Table 4.

Responsiveness of the Chinese SF-36 in rheumatoid arthritis

Status of patient (number)PFRPBPGHVTSFREMHPCSMCS
Liang's standardized responseImproved (86)−0.46−0.48−0.64−0.30−0.49−0.50−0.45−0.32−0.56−0.38
MeanUnchanged (283)−0.04−0.11−0.13−0.11−0.11−0.08−0.11−0.05−0.10−0.09
Worsened (33)0.510.500.350.680.530.390.330.360.600.30
Kazis' effect sizeImproved−0.51−0.59−0.71−0.33−0.60−0.53−0.54−0.34−0.60−0.42
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.570.560.500.740.600.550.410.430.680.45
Guyatt's coefficientImproved−0.54−0.57−0.73−0.34−0.51−0.56−0.57−0.36−0.65−0.41
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.500.530.460.640.430.460.370.340.630.30
Liang's relative efficacyImproved–unchanged0.670.511.000.120.480.610.440.280.800.29
Worsened–unchanged0.390.490.271.000.530.290.250.220.630.20
Status of patient (number)PFRPBPGHVTSFREMHPCSMCS
Liang's standardized responseImproved (86)−0.46−0.48−0.64−0.30−0.49−0.50−0.45−0.32−0.56−0.38
MeanUnchanged (283)−0.04−0.11−0.13−0.11−0.11−0.08−0.11−0.05−0.10−0.09
Worsened (33)0.510.500.350.680.530.390.330.360.600.30
Kazis' effect sizeImproved−0.51−0.59−0.71−0.33−0.60−0.53−0.54−0.34−0.60−0.42
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.570.560.500.740.600.550.410.430.680.45
Guyatt's coefficientImproved−0.54−0.57−0.73−0.34−0.51−0.56−0.57−0.36−0.65−0.41
Unchanged−0.05−0.13−0.14−0.13−0.13−0.10−0.13−0.06−0.12−0.11
Worsened0.500.530.460.640.430.460.370.340.630.30
Liang's relative efficacyImproved–unchanged0.670.511.000.120.480.610.440.280.800.29
Worsened–unchanged0.390.490.271.000.530.290.250.220.630.20

Nevertheless, the subscales of bodily pain (BP), physical functioning (PF), role physical (RP) and, to a lesser extent SF, are the most ideal for assessing change of QOL in RA. PCS has a relatively large signal-to-noise ratio as it was not stable in patients who remained unchanged, even though it showed large magnitudes of change in its responsive statistics in patients who improved or deteriorated. GH also has a large signal-to-noise ratio even though it has the best Liang's relative efficacy for patients who have deteriorated.

Discussion

In the past decade, health status instruments have been increasingly used to monitor and assess clinical outcomes and burden of disease and to guide resource allocation. Several generic and RA-specific health status instruments have been used to assess QOL in RA patients but none is perfect. The SF-36 possessed reasonable validity, reliability and responsiveness in patients with varying severity of RA [1, 22, 23], though some subscales did not perform as well as others. A Norwegian study comparing the SF-36 with the Arthritis Impact Measurement Scales (AIMS2) and modified HAQ showed that the SF-36 functioned well but the PF subscale did not capture all aspects of physical health [23]. The first RA-specific QOL (RAQoL) instrument conducted in Europe [24] was also found to be reliable and could distinguish patients with different disease severity [25, 26].

The SF-36 was chosen for its ease of administration in a busy clinic and its facility for interdisease correlation. Our study population, with a wide spectrum of disease severity, is likely to be typical of Chinese-speaking RA patients in Singapore as they were recruited from the major rheumatology centre in the country. Despite the relatively low level of education of the respondents, they understood the items in the Chinese SF-36 and they completed it in a reasonable amount of time.

We showed that the Chinese SF-36 fulfils the criteria for test–retest reliability, internal consistency and construct validity in RA patients. Similar to previous reports, we showed that PF was significantly associated with HAQ and SF-36 correlated better with patient's than physician's assessment of RA activity [1, 22]. However, there was no significant correlation with ACR disease activity measures, probably due to the low tender and swollen joint counts in our study population. Rasch model analysis confirmed that the rank order of difficulty of the PF items in the Chinese SF-36 is consistent with the concepts inherent in the original design [17, 18]. The ranking of difficulty of the PF items by our Chinese patients is most similar to that of the Italian general population in the study of SF-36 in seven nations [18].

We showed that BP, PF, RP and SF were the best subscales for assessing change while GH was the least responsive. In a study of the responsiveness of the SF-36 in 60 RA patients who received infliximab, it was reported that the RE and MH subscales were least responsive to change [27]. By comparing the quantum of change in the SF-36 against arbitrary categories of change in disease states, Kosinski and colleagues found that the BP was the most ideal subscale in discriminating levels of change in patient and physician global assessments [20].

Ideally, when we assess the QOL of RA, it should not be confounded by the presence of co-morbid conditions. Co-existing conditions are known to affect self-reported disabilities in the general population [28]. Careful patient selection, excluding those with ‘significant’ co-morbidities, is the only way to limit this problem. Here, our filtering method seems to have succeeded because ANOVA with post hoc analysis did not show correlation between numbers of co-morbidities and SF-36. Significantly, two items showed DIF (SF 1, whether health or emotional problems interfered with social activities and GH 4, getting ill more easily than others). This suggests that patients with co-morbidities tend to regard these two questions differently from those without co-morbidities.

Future work should be carried out to assess the performance of Chinese SF-36 in clinical trials, to compare it with RAQoL, and to determine how the Chinese SF-36 is affected by other factors such as co-morbidities, age, gender, educational level or socio-economic factors.

In conclusion, the HK Chinese SF-36 is a valid tool for assessing QOL in Chinese-speaking RA patients. The two subscales with the best psychometric properties are PF and BP. RP suffers from poor responsiveness and floor and ceiling effects, GH from poor responsiveness, VT from reduced internal consistency, SF from poor test–retest reliability, RE from floor and ceiling effects and MH from reduced internal consistency and test–retest reliability.

*The TTSH RA Study group consists of K. O. Kong, B. Y. H. Thong, W. G. Law, T. Y. Lian, Y. K. Cheng, H. H. Chng, C. L. Teh, L. C. Chew, T. C. Lau, H. S. Howe and W. H. Yong, Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore.

This study was supported by a grant from the National Healthcare Group Cluster Research Grant, Singapore and SYC was supported by a grant from the Biomedical Research Council of Singapore (01/1/28/18/016). We thank the Medical Outcomes Trust and Dr Cindy Lam for permission to use the SF-36™ Health Survey.

The authors have declared no conflicts of interest.

References

1

Ruta DA, Hurst NP, Kind P, Hunter M, Stubbings A. Measuring health status in British patients with rheumatoid arthritis: reliability, validity and responsiveness of the short form 36-item heath survey (SF-36).

Br J Rheumatol
1998
;
37
:
425
–36.

2

Loge JH, Kaasa S, Hjermstad MJ, Kvien TK. Translation and performance of the Norwegian SF-36 health survey in patients with rheumatoid arthritis. I. Data quality, scaling assumptions, reliability and construct validity.

J Clin Epidemiol
1998
;
51
:
1069
–76.

3

Arnett FC, Edworthy SM, Bloch DA et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis.

Arthritis Rheum
1988
;
31
:
315
–24.

4

Hochberg MC, Chang RW, Dwosh I, Lindsey S, Pincus T, Wolfe F. The American College of Rheumatology 1991 revised criteria for the classification of global functional status in rheumatoid arthritis.

Arthritis Rheum
1992
;
35
:
498
–502.

5

Koh ET, Seow A, Pong LY et al. Cross cultural adaptation and validation of the Chinese Health Assessment Questionnaire for use in rheumatoid arthritis.

J Rheumatol
1998
;
25
:
1705
–8.

6

Ware JE, Kosinski M, Gandek B.

SF-36® Health Survey: manual and interpretation guide
. Lincoln, RI: QualityMetric Incorporated,
2001
.

7

Lam CL, Gandek B, Ren XS, Chan MS. Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey.

J Clin Epidemiol
1998
;
51
:
1139
–47.

8

Ren XS, Amick B, Zhou L, Gandek B. Translation and psychometric evaluation of a Chinese version of the SF-36 Health Survey in the United States.

J Clin Epidemiol
1998
;
51
:
1129
–38.

9

Thumboo J, Chan SP, Machin D et al. Measuring health-related quality of life in Singapore: normal values for the English and Chinese SF-36 health survey.

Ann Acad Med Singapore
2002
;
31
:
366
–74.

10

Ware JE, Kosinski M.

SF-36® Physical and Mental Health Summary Scales: a manual for users of version 1, 2nd edn
. Lincoln, RI: QualityMetric Incorporated,
2001
.

11

Rau R, Herborn G. A modified version of Larson's scoring method to assess radiologic changes in rheumatoid arthritis.

J Rheumatol
1995
;
22
:
1976
–82.

12

Liang MH, Larson MG, Cullen KE, Schwarz JA. Comparative measurement efficiency and sensitivity of five health status instruments for arthritis research.

Arthritis Rheum
1985
;
28
:
542
–7.

13

Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopaedic evaluation.

Med Care
1990
;
28
:
632
–42.

14

Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status.

Med Care
1989
;
27
:
S178
–S189.

15

Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments.

J Chronic Dis
1987
;
40
:
171
–8.

16

Zumbo BD.

A handbook on the theory and methods of differential item functioning (DIF): logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores
. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense,
1999
.

17

Kosinski M, Keller SD, Hatoum HT, Kong SX, Ware JE, Jr. The SF-36 Health Survey as a generic outcome measure in clinical trials of patients with osteoarthritis and rheumatoid arthritis: tests of data quality, scaling assumptions and score reliability.

Med Care
1999
;
37(Suppl 5)
:
MS10
–22.

18

Raczek AE, Ware JE, Bjorner JB et al. Comparison of Rasch and summated rating scales constructed from SF-36 Physical Functioning items in seven countries: results from the IQOLA project.

J Clin Epidemiol
1998
;
51
:
1203
–14.

19

Beaton DE, Bombardier C, Katz JN, Wright JG. A taxonomy for responsiveness.

J Clin Epidemiol
2001
;
54
:
1204
–17.

20

Kosinski M, Zhao SZ, Dedhiya S, Osterhaus T, Ware JE, Jr. Determining minimally important changes in generic and disease-specific health-related quality of life questionnaires in clinical trials of rheumatoid arthritis.

Arthritis Rheum
2000
;
43
:
1478
–87.

21

Wright JG, Young NL. A comparison of different indices of responsiveness.

J Clin Epidemiol
1997
;
50
:
239
–46.

22

Talamo J, Frater A, Gallivan S, Young A. Use of the Short-form 36 (SF-36) for health status measurement in rheumatoid arthritis.

Br J Rheumatol
1997
;
36
:
463
–9.

23

Kvien TK, Kaasa S, Smedstad LM. Performance of the Norwegian SF-36 health survey in patients with rheumatoid arthritis. II. A comparison of the SF-36 with disease-specific measures.

J Clin Epidemiol
1998
;
51
:
1077
–86.

24

Whalley D, McKenna SP, De Jong Z, Van Der Heijde D. Quality of life in rheumatoid arthritis.

Br J Rheumatol
1997
;
36
:
884
–8.

25

De Jong Z, Van der Heijde D, McKenna SP, Whalley D. The reliability and construct validity of the RAQoL: a rheumatoid arthritis-specific quality of life instrument.

Br J Rheumatol
1997
;
36
:
878
–83.

26

Tijhuis GJ, de Jong Z, Zwinderman AH et al. The validity of the rheumatoid arthritis quality of life (RAQoL) questionnaire.

Rheumatology
2001
;
40
:
1112
–9.

27

Russell AS, Conner-Spady B, Mintz A, Maksymowych WP. The responsiveness of generic health status measures as assessed in patients with rheumatoid arthritis receiving infliximab.

J Rheumatol
2003
;
30
:
941
–7.

28

Krishnan E, Häkkinen A, Sokka T, Hannonen P. Impact of age and comorbidities on the criteria for remission and response in rheumatoid arthritis.

Ann Reum Dis
2005
;
64
:
1350
–2.

Author notes

Departments of Rheumatology, Allergy and Immunology and 1Diagnostic Imaging, Tan Tock Seng Hospital and 2Department of Community, Occupational and Family Medicine, National University of Singapore, Singapore.

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.