Article Text

Extended report
An OMERACT reliability exercise of inflammatory and structural abnormalities in patients with knee osteoarthritis using ultrasound assessment
  1. George AW Bruyn1,
  2. Esperanza Naredo2,
  3. Nemanja Damjanov3,
  4. Artur Bachta4,
  5. Paul Baudoin1,
  6. Hilde Berner Hammer5,
  7. Femke BG Lamers-Karnebeek6,
  8. Ingrid Moller Parera7,
  9. Bethan Richards8,
  10. Mihaela Taylor9,
  11. Ami Ben-Artzi9,
  12. Maria-Antonietta D'Agostino10,
  13. Jesus Garrido11,
  14. Annamaria Iagnocco12
  15. on behalf of the Ultrasound Task Force
    1. 1Department of Rheumatology, MC Groep Hospitals, Lelystad, The Netherlands
    2. 2Department of Rheumatology, Hospital GU Gregorio Marañón and Universidad Complutense, Madrid, Spain
    3. 3Institute of Rheumatology, University of Belgrade School of Medicine, Belgrade, Serbia
    4. 4Department of Internal Diseases and Rheumatology, Military Medical Hospital, Warsaw, Poland
    5. 5Department of Rheumatology, Diakonhjemmet Hospital, Oslo, Norway
    6. 6Bernhoven Ziekenhuis Uden and Radboudumc, Nijmegen, The Netherlands
    7. 7Instituto Poal de Reumatologia, Barcelona, Spain
    8. 8Institute of Rheumatology and Orthopaedics, Royal Prince Alfred Hospital, Sydney, Australia; Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
    9. 9Division of Rheumatology, UCLA, Santa Monica, California, USA
    10. 10Department of Rheumatology, Hôpital Ambroise Paré, APHP, Université Versailles Saint Quentin en Yvelines, Boulogne-Billancourt, France
    11. 11Social Psychology and Methodology Department, Faculty of Psychology, Autonomous University of Madrid, Madrid, Spain
    12. 12Department of Rheumatology, University Hospital, Sapienza Universita di Roma, Rome, Italy
    1. Correspondence to George AW Bruyn, Department of Rheumatology, MC Groep Hospitals, Lelystad, 8333 AA, The Netherlands; gawbruyn{at}wxs.nl

    Abstract

    Objective To assess whether ultrasonography (US) is reliable for the evaluation of inflammatory and structural abnormalities in patients with knee osteoarthritis (OA).

    Methods Thirteen patients with early knee OA were examined by 11 experienced sonographers during 2 days. Dichotomous and semiquantitative scoring was performed on synovitis characteristics in various aspects of the knee joint. Semiquantitative scoring was done of osteophytes at the medial and lateral femorotibial joint space or cartilage damage of the trochlea and on medial meniscal damage bilaterally. Intra- and interobserver reliability were computed by use of unweighted and weighted κ coefficients.

    Results Intra- and interobserver reliability scores were moderate to good for synovitis (mean κ 0.67 and 0.52, respectively) as well as moderate to good for the global synovitis (0.70 and 0.50, respectively). Mean intra- and interobserver reliability κ for cartilage damage, medial meniscal damage and osteophytes ranged from fair to good (0.55 and 0.34, 0.75 and 0.56, 0.73 and 0.60, respectively).

    Conclusions Using a standardised protocol, dichotomous and semiquantitative US scoring of pathological changes in knee OA can be reliable.

    • Ultrasonography
    • Synovitis
    • Knee Osteoarthritis

    Statistics from Altmetric.com

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

    Introduction

    Patients with knee osteoarthritis (OA) have pain, impaired muscle strength particularly of the quadriceps, limited joint mobility and joint instability.1 Moreover, end-stage knee joint OA is the foremost reason for knee replacement surgery. Although several underlying pathological mechanisms of knee OA have been recognised, its aetiology is complex and relies on a seemingly endless number of different factors.2 Various avenues of research including imaging techniques may fill the gaps in existing knowledge.

    Traditionally, conventional radiography (CR) is used to diagnose the changes of knee OA; however, early changes are difficult to assess and CR is not capable of visualising the inflammatory component. Ultrasonography (US) is an imaging technique that is able to visualise both structural changes of bone and inflammatory changes within a joint. Previous research has demonstrated that US is both highly sensitive compared with CR for detecting erosive changes in early rheumatoid arthritis (RA)3 and more sensitive in detecting knee effusion compared with clinical examination.4

    This study aimed to assess whether US is a reliable tool to diagnose abnormal findings at cartilage, bone and synovial membrane level in the knee of patients with OA. Specific purposes of this study were the following: (1) to reach consensus on US-detected elementary lesions of knee OA, (2) to generate agreed scoring systems for inflammatory and structural changes in knee OA, (3) to test the intra- and interobserver reliability of these scoring systems in patients with OA.

    Methods

    Study design

    The study comprised of two consecutive phases: (1) consensus on US basic lesions and scoring system for knee OA, (2) patient-based exercise to assess the reliability of US in detecting and scoring basic lesions in knee OA.

    Consensus process

    This process consisted of three steps: (1) a Delphi consensus process on identification, definitions and scoring system for US inflammation and structural changes in knee OA among experts in musculoskeletal (MS) US; (2) the collection of US images of the US basic lesions scores agreed on in the previous phase by these experts; (3) a further consensus on the assigned scores of the collected images of US basic lesions in knee OA that were shown during a consensus meeting prior to the reliability exercise on patients.

    Delphi consensus

    We conducted a three-round Delphi consensus process through three consecutive written questionnaires sent by Email to 13 rheumatologists, expert in MS US, from nine countries. These experts were selected because of their declared interest in participating in the Outcome Measures in Rheumatology (OMERACT) US Task Force on knee OA. The participants were asked to rate their level of agreement or disagreement for each question/statement on a 1–5 Likert scale (1=strongly disagree to 5=strongly agree). Space for additional free comments was also included at the end of each question/statement. The participants were asked to respond each questionnaire within 2 weeks; after 2 weeks, Email reminders were sent to the non-respondents.

    The first questionnaire included 21 questions or statements divided into two sections as follows: (1) selection and definitions of US basic lesions in knee OA and (2) scoring system for US basic lesions in knee OA. The already published OMERACT definitions were not rated (ie, synovial hypertrophy, effusion, cartilage damage).5 ,6

    The second questionnaire comprised 10 questions or statements divided into the above two parts. The second questionnaire and the results from the first questionnaire were sent by Email to the respondents of the first questionnaire. The content of the second questionnaire consisted of several questions or statements not previously agreed on and some new questions or statements generated from the comments supplied in the first questionnaire.

    The third questionnaire included five questions or statements divided into the above two sections. The third questionnaire and the results from the second and first questionnaires were sent by Email to the respondents of both questionnaires. Again, the content of the third questionnaire consisted of some questions not previously agreed on and some new questions arising from the comments of the second questionnaire.

    Group agreement on acceptance of questions or statements was considered if ≥75% of respondents scored an item either 4 or 5. Group agreement on rejection of questions or statements was considered if ≥75% of respondents scored an item either 1 or 2.

    Collection of US images

    Participants were requested to send representative images of all normal/pathological findings included in the protocol. The images were sent by Email to the investigator who coordinated this study phase (AI). Distinct examples were assembled and available as an online atlas.

    Consensus meeting

    A meeting of the experts who participated in the patient-based reliability assessment was held the day before the exercise. During this meeting, the protocol and representative images of all findings were discussed. In addition, the images collected by the participants were shown, and the assigned scores were discussed and either agreed on immediately or after discussion by the group.

    US reliability assessment

    The second part of the study consisted of a reliability exercise on patients with knee OA conducted over 2 days in Amsterdam, the Netherlands. The exercise was 16 h in total duration divided into four sessions, a 4 h morning session and a 4 h afternoon session each day. This exercise included intra- and interobserver reliability assessment of US in detecting and scoring US inflammation (figure 1) and structural changes (figure 2) in knee OA.

    Figure 1

    Longitudinal scan of the medial knee joint space, showing a moderately protruded medial meniscus, grade 2 (***). In addition, osteophytes grade 2 (white arrows) are depicted.

    Figure 2

    Longitudinal scan of the suprapatellar recess showing synovitis grade 3. Both elements of synovitis are shown (ie, synovial hypertrophy and effusion).

    Patients

    Thirteen patients with symptomatic knee OA were recruited from the Outpatient Rheumatology Clinic of the MC Groep hospitals. The diagnosis was based on the American College of Rheumatology criteria for OA with radiographic confirmation.7 The following data were recorded for each patient: age, gender, body mass index, symptom duration and radiographic stage according to the Kellgren–Lawrence scale (see online supplementary table S1). Patients were not included if there was a history of knee arthroplasty or recent corticosteroid knee joint injections (<6 weeks).

    Each patient was randomly assigned to a scanner where they remained supine during both the morning and afternoon sessions. The study was conducted in accordance with the Declaration of Helsinki and was approved by the Local Ethics Committee of MC Groep hospitals. Written informed consent was obtained from all patients before the study.

    US examination

    The ultrasonographers consisted of 11 rheumatologists with more than 6 years of experience in MS US who had participated in the full consensus process. Two of the participants in the Delphi process were not able to attend the reliability exercise.

    The US investigation was carried out using commercially available real-time scanners (five MyLab Twice and one MyLab 70 XVision; Esaote, Genoa, Italy) equipped with multi-frequency linear transducers (6–18 MHz). The B-mode and power Doppler (PD) settings of each type of machine were adjusted to optimise image resolution and sensitivity to detect flow, respectively, in the knee by an application specialist and the ultrasonographers before the reliability exercise. The ultrasonographers were not allowed to change these settings during the reliability exercise except for the position of the focus and the colour box size/location according to the depth and size of the examined structure, respectively.

    The 11 ultrasonographers blindly, independently and consecutively carried out a greyscale and PD US examination of both knees of each patient in two rounds in a blinded fashion. To minimise recall bias, the two rounds were scheduled several hours apart, that is, a morning and an afternoon session; additionally, the ultrasonographers were assigned to the US machines in a different order during the morning and afternoon sessions. They were unaware of the clinical details. Each ultrasonographer was given a maximum of 8 min to scan each patient and fill out a standardised report sheet with the US findings. An application specialist from the US manufacturer company was present in the room to solve technical adjustment problems; two assistant students were present in the room to collect the filled score sheets after each US examination.

    The suprapatellar, medial and lateral parapatellar recesses, as well as medial and lateral facets of the trochlear cartilage area, were scanned in each knee. Online supplementary table S2 lists the anatomical areas and the standard scanning method.8–12

    The following scanning planes were used for detection and scoring lesions: longitudinal plane for the suprapatellar recess, the femorotibial space (medially and laterally) and the medial horn of the medial meniscus, transverse plane to the patella for the parapatellar recesses and transverse plane for the trochlear articular cartilage (on maximally flexed knee joints).

    Synovitis and its components (ie, synovial hypertrophy and synovial fluid), articular changes and meniscal damage were scored according to the scoring systems agreed on in the consensus process. In addition, a global score for synovitis was assigned to each knee, which corresponded to the maximum score for synovitis obtained at the suprapatellar or parapatellar recesses.

    Statistical analysis

    Statistical analysis was performed using SPSS V.21 (SPSS, Chicago, Illinois, USA). Simple summary statistics were performed for the responses of the Delphi questionnaires. The results from the Delphi process are presented as the percentage of responders who scored a question/statement as either 4 or 5. Quantitative variables are presented as the mean SD and qualitative variables as counts and percentages. Intraobserver reliability was assessed by unweighted Cohen's κ for dichotomous variables, synovial hypertrophy and effusion, and by weighted Cohen's κ for semiquantitative variables, synovitis, cartilage damage, meniscal damage and osteophytes. Intraobserver κ values were obtained for each observer and summarised as mean κ and CI of 95%. Interobserver reliability was assessed by unweighted or weighted Light's κ for more than two raters. Light’s κ is an extension of Cohen's κ for multiple raters by averaging the κ coefficients from the n (n−1)/2 different pairs of observers. Light's κ statistics were computed independently for each round.

    κ Values of 0–0.20 were considered poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good and 0.81–1 excellent.13 p Values <0.05 were considered significant.

    Results

    Delphi results

    After three rounds of Delphi survey concerning US elementary lesions and scoring system for knee OA inflammatory and structural changes, agreement was obtained for about 22 out of 25 questions. The group reached agreement on the following definitions:

    • Meniscal damage: protrusion of the external edge of the meniscus outside the joint space.

    • Osteophytes: a step-up of bony prominence at the end of the normal bone contour or at the margin of the joint seen in two perpendicular planes with or without acoustic shadow.

    Online supplementary table S3 lists the agreed scoring systems for the US findings.

    Thirteen patients were assessed in the reliability exercise (12 female, one male); mean age, 65.9±5.3 years. Nine of them had Kellgren–Lawrence scores 2 or 3 (see online supplementary table S1). The prevalence of US-detected cartilage abnormalities in our sample is reported in table 1 according to the pooled findings of the 11 observers. There were 12 missed data for synovitis, eight for synovial hypertrophy and effusion, 13 for synovial Doppler signal, three for cartilage, five for meniscal damage and six for osteophytes. As the prevalence of synovial PD signal was very low, reliability could not be calculated for this finding.

    Table 1

    Prevalence of US-detected osteoarthritic changes in 26 explored knees

    Table 2 lists the mean κ values as well as the positive and negative agreement values for intraobserver reliability. Table 3 lists the computed κ values for the two rounds.

    Table 2

    Intraobserver κ values for agreement of US abnormalities

    Table 3

    Interobserver κ values for agreement of US abnormalities in knee OA

    Discussion

    This study investigated the ability of US to evaluate pathology in patients with knee OA. The clinical significance of the findings is relevant as the study protocol captured multiple factors of the osteoarthritic knee signature profile.

    We examined three factors of structural changes in the osteoarthritic knee. First, medial radial displacement (MRD) of the medial meniscus is considered a signature feature of knee OA. The cause of the protrusion is unknown, but MRD may play a hitherto unappreciated role in the pathogenesis of OA due to its close relationship with the medial collateral ligament.14 Bevers et al15 were the first to study the interobserver reliability reporting a moderate κ for two observers. Our findings indicate a good intraobserver reliability and a moderate interobserver reliability, confirming the previous findings of Bevers et al.

    Second, osteophytes were scored reliably. The reproducibility was independent of the osteophyte grading. This observation may be relevant for further research in early knee OA, as small osteophytes may be difficult to detect with CR.

    Third, trochlear cartilage defects were assessed as the least reliable and our results compare poorly with those of Bevers et al.15 The reasons may be that the investigation time for scoring multiple items was relatively short; possibly, the training session may have been insufficient as to this point. Also, some investigators noted that an image atlas would have been helpful. Cartilage thinning may be more difficult to assess in a standardised way than previously thought; further research into this area seems warranted.

    In addition, we studied four aspects of knee inflammation, that is, B-mode synovitis and its components, synovial hypertrophy and effusion, and the presence of synovial PD signal in three synovial recesses. We also tested the reliability for global knee synovitis, which may better reflect the overall degree of knee inflammation. The OMERACT US study group has extensive experience in conducting reliability studies of inflammatory aspects of rheumatic disease.16 The present results showed moderate to good intra- and interobserver reliability in assessing knee synovitis, in agreement with previous findings.15 We found a high prevalence of both synovial hypertrophy and effusion. The high prevalence of effusions confirms the observations in a population study.17 Regarding synovial PD signal, its low prevalence in our sample prevented calculation of reliability for this parameter. Nevertheless, synovial neovascularisation is not a prominent feature in OA.

    The main finding of this study is the moderate to good interobserver reliability for the majority of osteoarthritic changes. These findings support the generalisability of our standard protocol to be implemented across multiple other centres. A prerequisite is that the rheumatologists are rigorously trained and adhere to a vigorous standard protocol. Some results may be improved by the development of an image atlas accompanying the US examination protocol. Furthermore, the design of a rigorous time slot in the study protocol would not apply for clinical practice.

    Our study has some limitations. First, the study included a small number of patients. However, the sample size of our study is not uncommon in reliability studies; it is primarily based on pragmatic considerations, not on statistics. Second, the X-rays were scored retrospectively and were not part of the study protocol. Thus, a real gold standard is lacking, hampering computation of the true prevalence of lesions. However, this was a reliability study, not a validation exercise.

    In conclusion, this study suggests that rheumatologists can reach substantial reliability in their performance of US assessment of both structural and inflammatory abnormalities of the osteoarthritic knee.

    Acknowledgments

    We are grateful to Marian de Waal, Ria de Kort, Valerie and Charlotte Bruyn for logistical support. We would like to thank Esaote Netherlands BV for providing the ultrasound (US) machines and Abbvie Netherlands BV for financial support.

    References

    View Abstract

    Supplementary materials

    • Supplementary Data

      This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

      Files in this Data Supplement:

    Footnotes

    • Handling editor Hans WJ Bijlsma

    • Correction notice This article has been corrected since it was published Online First. Figures 1 and 2 legends have been transposed.

    • Collaborators Members of the OMERACT Ultrasound Task Force are: Sibel Aydin, Marina Backhaus, Peter Balint, Isabelle Chary-Valckenaere, Jane Freeston, Frederique Gandjbakhch, Marwin Gutierrez, Emilio Filippucci, Petra Hanova, Kei Ikeda, Frederick Joshua, Sandrine Jousse-Joulin, David Kane, Gurjit Kaeley, Zunaid Karim, Eugene Kissin, Helen Keen, Juhani Koski, Damien Loeuille, Peter Mandl, Eugenio de Miguel, Carlos Pineda, Ralph Thiele, Marcin Skzudlarek, Lene Terslev, Wolfgang Schmidt and Richard Wakefield.

    • Contributors All authors participated in the execution of parts or whole of this reliability study. All authors approved the final version to be submitted for publication. The study comprised of a Delphi exercise, an image reading and a patient exercise. JG performed the statistical analysis. GAWB, EN, ND and AI designed the study.

    • Funding Abbvie Pharmaceuticals Netherlands BV.

    • Competing interests None declared.

    • Patient consent Obtained.

    • Ethics approval MC Groep Hospitals.

    • Provenance and peer review Not commissioned; externally peer reviewed.