Original Articles
The Sharp/van der Heijde method out-performed the Larsen/Scott method on the individual patient level in assessing radiographs in early rheumatoid arthritis

https://doi.org/10.1016/j.jclinepi.2003.10.014Get rights and content

Abstract

Objective

To test the reliability of two radiologic scoring methods in rheumatoid arthritis (RA)—the Sharp/van der Heijde (SvH) and the Larsen/Scott (LS)—with generalizability analyses.

Study design and setting

Films of 51 patients representing the spectrum of early RA were read by two raters for each method. The discriminative ability and responsiveness were expressed as: intraclass correlation coefficients (ICCs), two types of smallest detectable difference (SDD), and two types of smallest detectable change (SDC); reflecting measurement error when discriminating between or detecting changes within (1) individuals or (2) groups. They were calculated for (average) scores of one to three raters.

Results

The discriminative capacity (0.85–0.97) and responsiveness (0.91–0.97) were good when expressed by ICC. On the group level the SDDs and SDCs ranged between 0.6–3.3% of the max. obtainable score. On the individual level, the scores showed better reliability measured with the SvH (SDDs 2.0–3.4%) than with the LS (SDDs 5.3–9.2%). The SvH also assessed changes in scores in individuals with less measurement error (SDCs 1.3–2.2%) than the LS (SDCs 2.3–3.9%).

Conclusion

For early RA patients, the SvH seems preferable if analyses on individual level are included.

Introduction

Persistent arthritis caused by the chronic inflammatory disease rheumatoid arthritis (RA) often leads to joint damage. This can be visualized by x-rays as erosions and joint space narrowing. Prevention of radiologic damage is regarded as an important goal of RA therapy, and is recognized by the American Food and Drug Administration (FDA) as a separate claim. The most widely used radiologic scoring methods to quantify the joint damage in clinical trials or cohort studies are the Sharp and the Larsen methods and their modifications [1], [2], [3], [4], [5], [6], [7]. They are used to detect changes over time within patients or groups of patients (as an evaluative instrument) as well as to detect differences between individual patients or groups of patients (as a discriminative instrument) [8]. The sensitivity to detect changes over time ( = responsiveness) and the ability to discriminate between patients are both part of the reliability of these scoring methods and have been evaluated several times within the last decades by various statistics [4], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21]. This variability in methods does not allow an unequivocal comparative evaluation of radiologic scoring methods.

Reliability of a method is mostly expressed by intraclass correlation coefficients (ICCs). Besides the ICC, reliability analyses increasingly include statistics such as the standard error of measurement (SEM) [22], [23], [24], [25] or the smallest detectable difference (SDD) [26], [27], [28], which have the advantage that they express measurement error in the metric unit of the measurement. There is fewer consensus on which statistic to use to evaluate a measure or method's responsiveness. Standard effect sizes (SES), standard response means (SRM), or Guyatt responsiveness ratios [29] are often calculated [30], [31], [32], but also ICCs can be used to express the sensitivity to detect changes [33]. Advantages of also using an ICC or related statistics as responsiveness measurement are that both aspects of reliability of the measurement—the ability to discriminate between patients and the sensitivity to detect changes over time—are expressed in a similar manner, and can be compared directly, especially when they are calculated on the same data.

ICCs and related statistics are originally based on the classic test theory, a measurement theory originating from the social sciences and developed to estimate the reliability of scores derived from measures [34]. Later on, this theory was extended by generalizability theory [35], [36], [37], [38]. This theory recognizes multiple sources of measurement error (i.e., noise) within the same measurement. In accordance with this, multiple sources of measurement error can be estimated within the same ANOVA design, thereby allowing different applications of the instrument to be evaluated simultaneously. Reliability statistics based on the classic test theory have been assessed before for several radiologic scoring methods for RA [39]. However, generalizability analyses have rarely been used. [4], [20]. This is unfortunate because generalizability theory offers several advantages over standard measurement approaches, as we shall demonstrate.

The object of this study was to compare the ability (1) to discriminate between patients or groups of patients, and (2) to detect change in 1 year (1-year responsiveness) of two radiologic scoring methods, namely the van der Heijde modified Sharp method and the Scott modified Larsen method, by applying generalizability theory.

Section snippets

Radiologic scoring methods

The Sharp/van der Heijde method [3] assesses erosions and joint space narrowing separately in the hands and feet, and has a range from 0 to 448. Thirty-two joints in the hands and 12 in the feet are scored for erosions, with a maximum score of five per joint in the hands and 10 per joint in the feet. Joint space narrowing is graded from 0 to 4 in 30 joints in the hands and in 12 joints in the feet. The principal score used in the analyses is the total score, which is the sum of the erosion

G-study

Generalizability theory uses a factorial ANOVA model to evaluate multiple potential sources of measurement error by partitioning the total variance of the data into components originating from different independent sources and their interactions. In generalizability theory, sources of variation are called “facets” (analogous to the term “factors” in regular ANOVA analyses) and the levels of these facets are called “conditions.” In this study there are three main facets for each method: patient

Results

Table 3 presents the variance components, their standard errors, and each facet's contribution to the total variance for the scores of the Sharp/van der Heijde and Larsen/Scott method. The major proportion of variability of the status scores was accounted for by the between-patient variation; 61% for both the Sharp/van der Heijde and the Larsen/Scott scores. Other major contributors to the total variance were the facet time (17% and 13%, respectively) and its interaction with the facet patient

Discussion

The generalizability analyses in this study confirm the discriminative capacity and responsiveness of the two methods commonly used to score radiologic progression in RA, the Sharp/van der Heijde, and Larsen/Scott methods. Furthermore, the study is a demonstration of the value of generalizability theory in the assessment of discriminatory capacity and responsiveness of measures: it provides ICCs, SDDs, and SDCs for a variety of cross-sectional and longitudinal contexts. This is important

References (58)

  • A Larsen

    How to apply the Larsen score in evaluating radiographs of rheumatoid arthritis in longterm studies

    J Rheumatol

    (1995)
  • D.L Scott et al.

    Proposed modification to Larsen's scoring methods for hand and wrist radiographs

    Br J Rheumatol

    (1995)
  • M Cuchacovich et al.

    Prescision of the Larsen and the Sharp methods of assessing radiologic change in patients with Rheumatoid Arthritis

    Arthritis Rheum

    (1992)
  • K.W Drossaers-Bakker et al.

    A comparison of three radiologic scoring systems for the long-term assessment of rheumatoid arthritis: findings of an ongoing prospective inception cohort study of 132 women followed up for a median of twelve years

    Arthritis Rheum

    (2000)
  • A Guth et al.

    Reliability of the three methods of radiologic assessment in patients with Rheumatoid Arthritis

    Invest Radiol

    (1995)
  • M Lassere et al.

    Smallest detectable difference in radiological progression

    J Rheumatol

    (1999)
  • M.M O' Sullivan et al.

    Precision of larsen grading of radiographs in assesing progression of Rheumatoid arthritis in individual patients

    Ann Rheum Dis

    (1990)
  • L Paimela et al.

    Comparison of the original and the modified Larsen methods and the Sharp method in scoring radiographic progression in early Rheumatoid Arthritis

    J Rheumatol

    (1998)
  • T Pincus et al.

    Comparison of 3 quantitative measures of hand radiographs in patients with rheumatoid arthritis: Steinbrocker stage, Kaye Modified Sharp score, and the Larsen score

    J Rheumatol

    (1997)
  • M.P Plant et al.

    Measurement and prediction of radiological progression en early Rheumatoid Arthritis

    J Rheumatol

    (1994)
  • R Rau et al.

    Development and evaluation of a modified version of the Larsen method for evaluating rontgenlogic changes in chronic polyarthritis

    Z Rheumatol

    (1997)
  • J.T Sharp et al.

    Reproducibility of multiple-observer scoring of radiologic abnormalities in the hands and wrists of patients with rheumatoid arthritis

    Arthritis Rheum

    (1985)
  • J.T Sharp

    Radiologic assessment as an outcome measure in rheumatoid arthritis

    Arthritis Rheum

    (1989)
  • D van der Heijde et al.

    Reading radiographs in chronological order, in pairs or as single films has important implications for the discriminative power of rheumatoid arthritis clinical trials

    Rheumatology (Oxford)

    (1999)
  • S Wassenberg et al.

    Comparison of Larsen's and Sharp's method of scoring radiographs in Rheumatoid arthritis

    Arthritis Rheum

    (1994)
  • P.W Stratford et al.

    Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data

    Phys Ther

    (1997)
  • K.W Wyrwich et al.

    Linking clinical relevance and statistical significance in evaluating intra-individual changes in health-related quality of life

    Med Care

    (1999)
  • H.M Ladak et al.

    A semi-automatic technique for measurement of arterial wall from black blood MRI

    Med Phys

    (2001)
  • C.J Odom et al.

    Measurement of scapular asymetry and assessment of shoulder dysfunction using the Lateral Scapular Slide Test: a reliability and validity study

    Phys Ther

    (2001)
  • Cited by (28)

    • Methodology for presenting the results of radiological progression in clinical trials

      2009, Seminarios de la Fundacion Espanola de Reumatologia
    • Radiography

      2009, Rheumatoid Arthritis
    • Imaging as a follow-up tool in clinical trials and clinical practice

      2008, Best Practice and Research: Clinical Rheumatology
      Citation Excerpt :

      The total score ranges from 0 to 160. Although the Sharp score may have an advantage over the Larsen score in early arthritis37, there is still no universally accepted technique, and modifications to the existing schemes are often proposed. Considerable interobserver variation must be taken into account, especially when dealing with multicentre trials, when applying any of these methods.38

    • Radiography

      2008, Rheumatoid Arthritis
    View all citing articles on Scopus
    View full text