Abstract
Objective. The aim of this multireader exercise was to assess the reliability and change over time of erosion measurements in patients with rheumatoid arthritis (RA) using high-resolution peripheral quantitative computed tomography (HR-pQCT).
Methods. HR-pQCT scans of 23 patients with RA were assessed at baseline and 12 months. Four experienced readers examined the dorsal, palmar, radial, and ulnar surfaces of the metacarpal head (MH) and phalangeal base (PB) of the second and third digits, blinded to time order. In total, 368 surfaces (23 patients´ 16 surfaces) were evaluated per timepoint to characterize cortical breaks as pathological (erosion) or physiological, and to quantify erosion width and depth. Reliability was evaluated by intraclass correlation coefficients (ICC), percentage agreement, and Light k; change over time was defined by means ± SD of erosion numbers and dimensions.
Results. ICC for the mean measurements of width and depth of the pathological breaks ranged between 0.819–0.883, and 0.771–0.907, respectively. Most physiological cortical breaks were found at the palmar PB, whereas most pathological cortical breaks were located at the radial MH. There was a significant increase in both the numbers and the dimensions of erosions between baseline and follow-up (P = 0.0001 for erosion numbers, width, and depth in axial plane; P = 0.001 for depth in perpendicular plane).
Conclusion. This exercise confirmed good reliability of HR-pQCT erosion measurements and their ability to detect change over time.
High-resolution peripheral quantitative computed tomography (HR-pQCT) provides accurate detection of periarticular bone changes, which is required for diagnosis and therapeutic monitoring in rheumatoid arthritis (RA)1. Previously, the Study grouP for xtrEme Computed Tomography in Rheumatoid Arthritis (SPECTRA) collaboration presented a consensus definition for bone erosion, and a common approach for measuring erosion size, with feasibility and preliminary reliability tested in a cross-sectional dataset of RA metacarpophalangeal (MCP) joints (RELEX-1)2. Good agreement was demonstrated regarding the presence and nature of cortical breaks; however, agreement for measuring erosion dimensions needed refinement. We therefore performed this multireader HR-pQCT exercise in order to assess the reliability of erosion measurements and to evaluate change over time in patients with RA.
MATERIALS AND METHODS
Images. Twenty-three seropositive RA patients underwent HR-pQCT imaging of the second and third digits of their dominant hand at baseline (0 months) and follow-up (12 months) at the University of Erlangen-Nuremberg, Université de Lyon, University of California San Francisco, and University of Calgary. Patients were selected according to the presence of bone erosions on radiographs as assessed by the Sharp/van der Heijde (SvdH) score, and the need to change therapy due to insufficient disease control. Local ethics approval and written informed consent were obtained prior to study entry (institutional review board numbers: Calgary REB15-0582; San Francisco 12-10418; Lyon CPP 13/083; Erlangen 3839). All participants were scanned using a first-generation HR-pQCT scanner (Scanco Medical AG) and standard acquisition settings were applied with an 82-μm isotropic voxel size resolution3. Image datasets were viewed using Osirix (version 5.8). Readers were blinded to clinical status and time sequence of images.
Prior to image evaluation, the 4 readers calibrated measurements using a test set of images. All the readers participating had at least 3 years’ experience in reading HR-pQCT datasets.
Joint image evaluation. The image evaluation algorithm involves assessing 8 surfaces at each of the second and third MCP joints, specifically the palmar, dorsal, radial, and ulnar surfaces of each of the proximal phalangeal base (PB) and the metacarpal head (MH)1. Only images of sufficient quality were evaluated4. Individual surfaces were analyzed for the presence of cortical breaks (present or absent) according to the SPECTRA definition: The cortical break should be present in 2 consecutive slices and 2 perpendicular planes, and should show a loss of underlying trabecular bone. The cortical break is characterized as being pathological (erosion) or physiological, with the former described as a nonlinear appearance typical of erosions, and the latter as a parallel/linear break typical of vessel channels3. Supplementary Figure 1 (available with the online version of this article) gives an example of typical pathological or physiological cortical breaks. For erosions, readers quantified the size of the break by measuring the maximum width and corresponding depth in both axial and perpendicular planes to each surface. The depth of the cortical break was recorded on the same slice where the maximal width was obtained. All measures were quantified in mm. Readers noted whether multiple cortical breaks were present on the same surface but only recorded measurements for the largest cortical break.
Statistical analysis. The interreader reliability of the detection of cortical breaks was evaluated using the percentage of agreement and Light k for the chance-corrected agreement5,6. The intraclass correlation (ICC) was calculated as an indicator of variability in cortical break depth and width measurements between readers 1, 2, 3, and 4. Paired-sample t-test was used to evaluate the longitudinal changes between baseline and follow-up scans of each subject. Analyses were performed with SPSS (version 23; IBM Corp.).
RESULTS
Patient characteristics. Mean age (SD) was 46 (13) years, 60% women, mean disease duration 2.3 (2.8) years, and mean Disease Activity Score in 28 joints at baseline was 3.51 ± 1.03. There was no significant change detectable over time in SvdH score. All patients received methotrexate; 18 patients were also treated with or started receiving a tumor necrosis factor-α inhibitor.
Images from 2 timepoints (baseline and follow-up) for 23 subjects were evaluated, resulting in 46 individual joints with 368 unique surfaces that were evaluated per timepoint. Thus, a total of 736 surfaces were evaluated.
Evaluability of images. The percentage agreement for evaluability of all the images between all readers was 80% (589/736). The chance-corrected agreement was fair (Cohen κ 0.218; ranges for all individual reader pairs 0.005–0.519). Only the surfaces in which all 4 readers agreed that the image was evaluable were included beyond this step (n = 585). Evaluability was affected by the presence of motion artifacts and/or technical artifacts such as stack artifacts.
Presence of cortical breaks. The percentage agreement for the presence or absence of cortical breaks on all evaluable images between all readers was 57% (334/585). The chance-corrected agreement resulted in a moderate k value of 0.493. Cohen κ for all individual reader pairs (reader 1 vs reader 2, etc.) ranged between 0.405–0.551.
Characterization of cortical breaks. In total, 99 cortical breaks were identified on the baseline and follow-up images. The percentage agreement for the appearance of a cortical break as pathological or physiological between all readers was 81% (80/99). The chance-corrected agreement resulted in a substantial k value of 0.796. Cohen κ for all individual reader pairs ranged between 0.765–0.838.
Numbers and localizations of erosions and physiological cortical breaks. Table 1 shows the number of breaks (total and erosions) for the 8 individual surfaces in which all readers agreed on the presence of a cortical break. The distribution of cortical breaks confirmed findings from previous publications3,7,8.
Widths and depths of erosions. There were 41 cortical breaks detected as erosions by all readers. Table 1 shows the mean dimensions with SD measured by all readers of these 41 erosions for the respective surfaces the erosions were detected in.
Interreader agreement regarding measurements of the sizes of cortical breaks. Numbers and dimensions of cortical breaks were determined on surfaces where all readers agreed that an erosion was present (n = 41). ICC was calculated; for all 4 measures the ICC was high: mean values ± SD and ICC for erosion numbers, axial width and depth, as well as perpendicular width and depth were 1.39 ± 0.62 (ICC 0.803); 2.31 ± 1.39 (ICC 0.883); 1.85 ± 0.86 (ICC 0.907); 1.99 ± 0.87 (ICC 0.819), and 1.89 ± 0.91 (ICC 0.771), respectively (see Table 2 for details; for further measures of precision, see Supplementary Table 1, available with the online version of this article).
Longitudinal change of cortical breaks over time. All pairs of measures (baseline and follow-up) were evaluated and the mean values were compared to test for significant differences over time. In total, all pairs from all readers gave 285 pairs. Mean values (± SD) of erosion numbers, widths, and depths are shown in Table 3. There was a significant increase in both the numbers of erosions and the dimensions of the cortical breaks between baseline and follow-up scans (all P < 0.01).
DISCUSSION
In this multireader responsiveness exercise, we applied HR-pQCT imaging to assess reliability and change over time of erosion measurements in a dataset of patients with RA. We applied our consensus definition of bone erosion as well as a previously agreed evaluation algorithm3. The exercise yielded good reliability for HR-pQCT measurements (ICC > 0.771) and a significant increase was observed in both number and dimensions of erosions between baseline and follow-up (P < 0.01). Further, most physiological cortical breaks were found at the palmar PB, whereas most erosions were located at the radial MH; the distribution of erosions and physiological cortical breaks confirmed the findings from earlier studies3,7,8.
Agreement (ICC) for erosion numbers, width, and depth of cortical breaks were high, and k for appearance of cortical breaks were good. The reliability measures in this study revealed better results than in the RELEX-1 exercise3. For the current exercise, we used only 4, not 11, readers as in the first exercise with prestudy calibration3. It should be noted that we used 4 readers, unlike the 2 readers typically used in a clinical trial, and that images were read in unknown time order, which may also reduce responsiveness. A limitation of the study might be that only those surfaces were analyzed further, in which all readers agreed that the image quality was sufficient, and a cortical break was present, which reduced the number of analyzable surfaces. This emphasizes the need for adequate training before reading HR-pQCT images. On the other hand, the lack of training could be overcome by developing semiautomated algorithms allowing for volumetric assessment of pathological cortical breaks.
The analysis of change over time yielded highly significant values for mean ± SD of number, width, and depth of cortical breaks. Our findings showed responsiveness over time despite having small sample sizes and achieving disease control.
Ongoing work has evaluated the nature of small cortical breaks. Boutroy, et al9 performed a perfusion study on a cadaveric hand using contrast perfusion, confirming the location of vascular foramen and their comparative frequency in periarticular bone. This provides construct validity for the SPECTRA erosion definition. Scharmga, et al compared vascular foramen in histology and in HR-pQCT10. Perhaps unsurprisingly, due to differences in spatial resolution, there was a substantially higher number of vessel channels found in histology than in HR-pQCT. It needs to be assessed further, however, whether uniquely identified HR-pQCT small cortical breaks are of added value in RA monitoring.
While the assessment of radiographic joint space width in HR-pQCT may be semiautomated11,12,13, the evaluation algorithm of cortical breaks still requires training and time14,15. Therefore, our collaboration is pursuing the investigation of a common technical algorithm for semi- or fully automated erosion detection and measurement allowing for volumetric erosion assessment16.
In conclusion, HR-pQCT evaluation using trained readers allows for highly reliable and precise detection of cortical breaks and facilitates differentiation of pathological from physiological cortical breaks. Reading by less experienced readers results in fair k values with regard to evaluability and break detection. Moreover, our results suggest that HR-pQCT shows good responsiveness of erosion measures over time.
ACKNOWLEDGMENT
This paper was written on behalf of all SPECTRA collaboration members.
APPENDIX 1.
List of SPECTRA Collaboration members: Cheryl Barnabe, University of Calgary; Anne-Birgitte Blavnsfeldt, University of Aarhus; Stephanie Boutroy, Université de Lyon; Steven K. Boyd, University of Calgary; Andrew Burghardt, University of California San Francisco; Roland Chapurlat, Université de Lyon; Angela Cheung, University of Toronto; Ko Chiba, University of Nagasaki; Joost de Jong, Maastricht University Medical Centre; Klaus Engelke, University of Erlangen; Stephanie Finzel, University of Freiburg; Ursula Heilmeier, University of California San Francisco; Harry Genant, University of California, San Francisco; Piet Geusens, Maastricht University; Ellen-Margrethe Hauge, Aarhus University Hospital; Joost de Jong, Maastricht University; Denis Julien, Université de Lyon; Rashid Kapadia, Scanco, USA; Kresten Keller, Aarhus University Hospital; Roland Kocijan, University of Vienna; Sebastian Kraus, Kantonsspital Baden; Eric Lespessailles, University of Orleans; Xiaojuan Li, University of Cleveland; Sarah Manske, University of Calgary; Hubert Marotte, Saint-Etienne, Université de Lyon; Liam Martin, University of Calgary; Michiel Peters, Maastricht University; Valentina Pedoia, University of California San Francisco; Andrea Scharmga, Maastricht University; Georg Schett, University of Erlangen; Kathryn S. Stok, The University of Melbourne; Nikolay Tzaribachev, Bad Bramstedt; Joop van den Bergh, Maastricht University; Bert van Rietbergen, Eindhoven University of Technology; Tomohiro Shimuzu, University of California San Francisco; Lai-Shan Tam, University of Hong Kong; Karen Troy, University of Worchester; Mira van Veenendaal, University of Toronto; Nicolas Vilayphiou, Scanco, Switzerland; Paul Willems, Maastricht University Medical Centre; Rae Yeung, University of Toronto.
Footnotes
RELEX-2 was hosted by the University of San Francisco, California, USA. The meeting was sponsored in part by Scanco Medical AG. PGC is supported in part by the UK National Institute for Health Research (NIHR) Leeds Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.
The authors state that they have no conflicts of interest.
- Accepted for publication September 5, 2020.
- Copyright © 2021 by the Journal of Rheumatology
REFERENCES
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.