Original ArticlesThe Sharp/van der Heijde method out-performed the Larsen/Scott method on the individual patient level in assessing radiographs in early rheumatoid arthritis
Introduction
Persistent arthritis caused by the chronic inflammatory disease rheumatoid arthritis (RA) often leads to joint damage. This can be visualized by x-rays as erosions and joint space narrowing. Prevention of radiologic damage is regarded as an important goal of RA therapy, and is recognized by the American Food and Drug Administration (FDA) as a separate claim. The most widely used radiologic scoring methods to quantify the joint damage in clinical trials or cohort studies are the Sharp and the Larsen methods and their modifications [1], [2], [3], [4], [5], [6], [7]. They are used to detect changes over time within patients or groups of patients (as an evaluative instrument) as well as to detect differences between individual patients or groups of patients (as a discriminative instrument) [8]. The sensitivity to detect changes over time ( = responsiveness) and the ability to discriminate between patients are both part of the reliability of these scoring methods and have been evaluated several times within the last decades by various statistics [4], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. This variability in methods does not allow an unequivocal comparative evaluation of radiologic scoring methods.
Reliability of a method is mostly expressed by intraclass correlation coefficients (ICCs). Besides the ICC, reliability analyses increasingly include statistics such as the standard error of measurement (SEM) [22], [23], [24], [25] or the smallest detectable difference (SDD) [26], [27], [28], which have the advantage that they express measurement error in the metric unit of the measurement. There is fewer consensus on which statistic to use to evaluate a measure or method's responsiveness. Standard effect sizes (SES), standard response means (SRM), or Guyatt responsiveness ratios [29] are often calculated [30], [31], [32], but also ICCs can be used to express the sensitivity to detect changes [33]. Advantages of also using an ICC or related statistics as responsiveness measurement are that both aspects of reliability of the measurement—the ability to discriminate between patients and the sensitivity to detect changes over time—are expressed in a similar manner, and can be compared directly, especially when they are calculated on the same data.
ICCs and related statistics are originally based on the classic test theory, a measurement theory originating from the social sciences and developed to estimate the reliability of scores derived from measures [34]. Later on, this theory was extended by generalizability theory [35], [36], [37], [38]. This theory recognizes multiple sources of measurement error (i.e., noise) within the same measurement. In accordance with this, multiple sources of measurement error can be estimated within the same ANOVA design, thereby allowing different applications of the instrument to be evaluated simultaneously. Reliability statistics based on the classic test theory have been assessed before for several radiologic scoring methods for RA [39]. However, generalizability analyses have rarely been used. [4], [20]. This is unfortunate because generalizability theory offers several advantages over standard measurement approaches, as we shall demonstrate.
The object of this study was to compare the ability (1) to discriminate between patients or groups of patients, and (2) to detect change in 1 year (1-year responsiveness) of two radiologic scoring methods, namely the van der Heijde modified Sharp method and the Scott modified Larsen method, by applying generalizability theory.
Section snippets
Radiologic scoring methods
The Sharp/van der Heijde method [3] assesses erosions and joint space narrowing separately in the hands and feet, and has a range from 0 to 448. Thirty-two joints in the hands and 12 in the feet are scored for erosions, with a maximum score of five per joint in the hands and 10 per joint in the feet. Joint space narrowing is graded from 0 to 4 in 30 joints in the hands and in 12 joints in the feet. The principal score used in the analyses is the total score, which is the sum of the erosion
G-study
Generalizability theory uses a factorial ANOVA model to evaluate multiple potential sources of measurement error by partitioning the total variance of the data into components originating from different independent sources and their interactions. In generalizability theory, sources of variation are called “facets” (analogous to the term “factors” in regular ANOVA analyses) and the levels of these facets are called “conditions.” In this study there are three main facets for each method: patient
Results
Table 3 presents the variance components, their standard errors, and each facet's contribution to the total variance for the scores of the Sharp/van der Heijde and Larsen/Scott method. The major proportion of variability of the status scores was accounted for by the between-patient variation; 61% for both the Sharp/van der Heijde and the Larsen/Scott scores. Other major contributors to the total variance were the facet time (17% and 13%, respectively) and its interaction with the facet patient
Discussion
The generalizability analyses in this study confirm the discriminative capacity and responsiveness of the two methods commonly used to score radiologic progression in RA, the Sharp/van der Heijde, and Larsen/Scott methods. Furthermore, the study is a demonstration of the value of generalizability theory in the assessment of discriminatory capacity and responsiveness of measures: it provides ICCs, SDDs, and SDCs for a variety of cross-sectional and longitudinal contexts. This is important
References (58)
- et al.
Measuring health status: what are the necessary measurement properties?
J Clin Epidemiol
(1992) - et al.
Measuring change over time: assesing the usefulness of evaluation instruments
J Chronic Dis
(1987) - et al.
Methods for assessing responsiveness: a critical review and recommendations
J Clin Epidemiol
(2000) - et al.
Randomised comparison of combined step-down prednisolone, methotrexate and sulphasalazine with sulphasalazine alone in early rheumatoid arthritis
Lancet
(1997) - et al.
A taxonomy for responsiveness
J Clin Epidemiol
(2001) - et al.
How many joints in the hands and wrists should be included in a score of radiologic abnormalities used to assess rheumatoid arthritis?
Arthritis Rheum
(1985) - et al.
A new method of scoring radiographic change in rheumatoid arthritis
J Rheumatol
(1998) How to read radiographs according to the Sharp/van der Heijde method
J Rheumatol
(2000)- et al.
Reliability and sensitivity to change of a simplification of the Sharp/van der Heijde radiological assessment in rheumatoid arthritis
Rheumatology (Oxford)
(1999) - et al.
Radiographic evaluation of rheumatoid arthritis and related conditions by standard reference films
Acta Radiol
(1977)
How to apply the Larsen score in evaluating radiographs of rheumatoid arthritis in longterm studies
J Rheumatol
Proposed modification to Larsen's scoring methods for hand and wrist radiographs
Br J Rheumatol
Prescision of the Larsen and the Sharp methods of assessing radiologic change in patients with Rheumatoid Arthritis
Arthritis Rheum
A comparison of three radiologic scoring systems for the long-term assessment of rheumatoid arthritis: findings of an ongoing prospective inception cohort study of 132 women followed up for a median of twelve years
Arthritis Rheum
Reliability of the three methods of radiologic assessment in patients with Rheumatoid Arthritis
Invest Radiol
Smallest detectable difference in radiological progression
J Rheumatol
Precision of larsen grading of radiographs in assesing progression of Rheumatoid arthritis in individual patients
Ann Rheum Dis
Comparison of the original and the modified Larsen methods and the Sharp method in scoring radiographic progression in early Rheumatoid Arthritis
J Rheumatol
Comparison of 3 quantitative measures of hand radiographs in patients with rheumatoid arthritis: Steinbrocker stage, Kaye Modified Sharp score, and the Larsen score
J Rheumatol
Measurement and prediction of radiological progression en early Rheumatoid Arthritis
J Rheumatol
Development and evaluation of a modified version of the Larsen method for evaluating rontgenlogic changes in chronic polyarthritis
Z Rheumatol
Reproducibility of multiple-observer scoring of radiologic abnormalities in the hands and wrists of patients with rheumatoid arthritis
Arthritis Rheum
Radiologic assessment as an outcome measure in rheumatoid arthritis
Arthritis Rheum
Reading radiographs in chronological order, in pairs or as single films has important implications for the discriminative power of rheumatoid arthritis clinical trials
Rheumatology (Oxford)
Comparison of Larsen's and Sharp's method of scoring radiographs in Rheumatoid arthritis
Arthritis Rheum
Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data
Phys Ther
Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life
Med Care
A semi-automatic technique for measurement of arterial wall from black blood MRI
Med Phys
Measurement of scapular asymetry and assessment of shoulder dysfunction using the Lateral Scapular Slide Test: a reliability and validity study
Phys Ther
Cited by (28)
Correlation between clinical activity measured by DAS-28 and ultrasound in patients with rheumatoid arthritis
2016, Revista Colombiana de ReumatologiaMethodology for presenting the results of radiological progression in clinical trials
2009, Seminarios de la Fundacion Espanola de ReumatologiaRadiography
2009, Rheumatoid ArthritisImaging as a follow-up tool in clinical trials and clinical practice
2008, Best Practice and Research: Clinical RheumatologyCitation Excerpt :The total score ranges from 0 to 160. Although the Sharp score may have an advantage over the Larsen score in early arthritis37, there is still no universally accepted technique, and modifications to the existing schemes are often proposed. Considerable interobserver variation must be taken into account, especially when dealing with multicentre trials, when applying any of these methods.38
Radiography
2008, Rheumatoid ArthritisPlain X-ray and rheumatoid arthritis. Systematic reading of the radiographic progression
2005, Seminarios de la Fundacion Espanola de Reumatologia