Abstract
Objective Musculoskeletal ultrasound (MSUS) is increasingly being used in the evaluation of pediatric musculoskeletal diseases. In order to provide objective assessments of arthritis, reliable MSUS scoring systems are needed. Recently, joint-specific scoring systems for arthritis of the pediatric elbow, wrist, and finger joints were proposed by the Childhood Arthritis and Rheumatology Research Alliance (CARRA) MSUS workgroup. This study aimed to assess the reliability of these scoring systems when used by sonographers with different levels of expertise.
Methods Members of the CARRA MSUS workgroup attended training sessions for scoring the elbow, wrist, and finger. Subsequently, scoring exercises of B mode and power Doppler (PD) mode still images for each joint were performed. Interreader reliability was determined using 2-way single-score intraclass correlation coefficients (ICCs) for synovitis and Cohen for tenosynovitis.
Results Seventeen pediatric rheumatologists with different levels of MSUS expertise (1-15 yrs) completed a 2-hour training session and calibration exercise for each joint. Excellent reliability (ICC > 0.75) was found after the first scoring exercise for all the finger and elbow views evaluated on B mode and PD mode, and for all of the wrist views on B mode. After a second training session and a scoring exercise, the wrist PD mode views reached excellent reliability as well.
Conclusion The preliminary CARRA MSUS scoring systems for assessing arthritis of the pediatric elbow, wrist, and finger joints demonstrate excellent reliability among pediatric MSUS sonographers with different levels of expertise. With further validation, this reliable joint-specific scoring system could serve as a clinical tool and scientific outcome measure.
Juvenile idiopathic arthritis (JIA) is a significant cause of morbidity worldwide.1 Persistent joint inflammation can lead to functional limitations and lower health-related quality of life.1 Clinical evaluation of JIA disease activity includes the assessment of active joint count, physician global assessment (PGA), parent/patient global assessment, presence and duration of morning stiffness, and biologic markers of inflammation.2 While these variables are included in validated composite outcome measures such as the Juvenile Arthritis Disease Activity Score (JADAS),3 recent guidelines acknowledge the need for further standardization of the patient/parent global assessment and PGA.2 The PGA can have poor interrater reliability among providers, particularly in patients with low disease activity or inactive disease.4 In addition, the reliability of active joint count is limited5 and cannot always adequately identify joints with synovitis.6
Musculoskeletal ultrasound (MSUS) is increasingly being used in children.7 It is well tolerated, readily available, and relatively inexpensive compared to other imaging modalities. Normal age-related findings and definitions of pediatric synovitis on MSUS have been developed.8-11 MSUS can provide point of care information including the identification of subclinical disease.6,12,13 In order to provide objective assessments of arthritis, reliable scoring systems are necessary. They exist for rheumatoid arthritis,14,15 but in light of the unique sonographic features of the pediatric joint, specific MSUS scoring systems for JIA are needed. Our group recently proposed a joint-specific MSUS scoring system for the assessment of arthritis of the pediatric elbow, wrist, and finger joints that demonstrated excellent reliability when used by experienced ultrasonographers (> 7 years of experience).16 As MSUS use increases in pediatric rheumatology, reliable scoring systems for different levels of experience are needed. The objective of this study was to assess the interreader reliability of a B mode and power Doppler (PD) mode scoring system for arthritis of the pediatric elbow, wrist, and finger16 among sonographers with different levels of experience.
METHODS
Seventeen pediatric rheumatology providers who are members of the Childhood Arthritis and Rheumatology Research Alliance (CARRA) MSUS workgroup participated. All providers had prior formal training in pediatric MSUS with 1 to 15 years of subsequent clinical experience in pediatric MSUS. For the analysis, participants were divided into 2 groups: an expert group, defined as participants with > 5 years of experience in MSUS and > 10 MSUS studies per week in children (n = 5); and a nonexpert group (n = 12). This study was approved by the Cincinnati Children’s Hospital Medical Center Institutional Review Board (approval no. 2018-7939). Written assent and consent to participate was obtained from all children whose images were used in the scoring exercises. Written informed consent for publication was obtained from all the study participants.
For each of the joints (elbow, wrist, and finger), participants received an initial 2-hour online virtual training session from an expert who had contributed to the preliminary CARRA scoring system (PVF, EO, JR).16 The training session included reviews of normal sonoanatomy, pathologic findings in JIA, the preliminary CARRA semiquantitative scoring system, and case-based examples of scoring including pitfalls. Participants then took part in a calibration exercise using still images. Through a subsequent debrief, any remaining questions were addressed.
The preliminary CARRA scoring system for the elbow, wrist, and finger joints consists of a semiquantitative grading from 0 to 3 (0 = normal or no pathology, 3 = severe pathology) for both B mode and PD mode images; tenosynovitis is assessed through a binary system with 0 (no pathology) or 1 (presence of pathology) in B mode and PD mode (Supplementary Table S1-S4, available with the online version of this article).16 Anonymized MSUS images of children aged 2 to 17 years were used, and the age of the patient was available to the participants.
Scoring exercises of both B mode and PD mode were completed for each joint. Interreader reliability was estimated using 2-way single-score intraclass correlation coefficients (ICC), a validated statistical measure of interreader reliability when variables in a study are rated by multiple coders.17 An ICC was considered excellent for values of 0.75 to 1.00, good 0.60 to 0.74, fair 0.40 to 0.59, and poor < 0.40.18 As a nominal variable, agreement for the scoring system of the extensor tendons of the wrist in transverse view was assessed using Cohen coefficient. values from 0.0 to 0.2 indicate slight agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 substantial agreement, and 0.81 to 1.0 almost perfect or perfect agreement.17 For those views that did not attain excellent reliability (ICC) or moderate agreement () for all participants for the lower end of the 95% CI, the participants underwent a subsequent round of calibration and scoring exercises using a different set of B mode and PD mode images. The statistical analysis software used was SAS version 9.4 (SAS Institute).
RESULTS
A total of 300 still images were used for the first round of calibration and scoring exercises. These images were obtained in children aged 2 to 18 years, distributed equally across this age range, and including a broad number of images for each of the grade 0 to 3 categories (Supplementary Table S5, available with the online version of this article). Interreader reliability results for the entire group of raters are shown in the Table. In general, experts and nonexperts combined demonstrated excellent interreader agreement for all B mode and PD mode views of the elbow and finger joints as well as the distal radioulnar and midline radiocarpal views of the wrist. For all participants, the view of the radiocarpal joint in ulnar probe position reached excellent reliability in B mode but only good reliability in PD mode for the lower limit of the CI (excellent for the ICC itself). Images of the extensor tendons of the wrists demonstrated moderate agreement in B mode ( criteria; see Methods section) but only fair agreement in PD mode for all participants. Given the overall excellent reliability for synovitis and moderate agreement for tenosynovitis in B mode, instead of proposing a modification of the scoring system, the PD mode scoring was repeated for the ulnar view of the radiocarpal joint and the extensor tendons following a second training session with special focus on the distinction of physiologic and pathologic findings. Regardless of level of expertise, excellent interrater reliability of 0.96 (95% CI 0.94-0.97) and moderate agreement of 0.72 (95% CI 0.61-0.83) were obtained, respectively, following the second training and scoring exercise. Separate results for the expert and nonexpert group are shown in Supplementary Table S6 and S7.
DISCUSSION
This study demonstrated excellent reliability of the preliminary semiquantitative CARRA MSUS scoring system for the pediatric elbow, wrist, and finger joints for providers with different levels of expertise. In addition, advanced MSUS concepts were successfully taught in a virtual format. By demonstrating the reliability of this semiquantitative measurement instrument, our study supports the potential use of MSUS scoring systems as an objective outcome measure at bedside and in further research studies.
Several MSUS scoring systems have been published in recent years.16,19-22 Only a few of these scoring systems have been evaluated with sonographers of variable experience. The pediatric MSUS scoring system of the knee proposed by the CARRA group was tested in pediatric rheumatology providers from the CARRA JIA ultrasound (US) working group (n = 16) with < 1 to 10 years of US experience. This exercise demonstrated good to excellent reliability for B mode and PD mode views.22 Most recently, Rossi-Semerano et al23 reported the reliability of the Outcome Measures in Rheumatology (OMERACT) pediatric US synovitis scoring system among 13 pediatric ultrasonographers of diverse subspecialty backgrounds: 9 rheumatologists, 2 pediatricians, and 2 radiologists with varying degrees of experience. This group used a total of 75 images to evaluate the reliability of the most representative view of the wrist, elbow, metacarpophalangeal (MCP) II, knees, and ankle joints. For the scoring systems of the MCP II, wrist, and elbow, they found fair to good reliability for the B mode and excellent reliability for PD mode. However, the scoring system used was not joint-specific, which the authors acknowledged.23 Our study used a larger sample of normal and pathologic images (n = 300), including all views recommended in the evaluation of JIA and involved 17 pediatric rheumatology sonographers with different levels of expertise. Our joint-specific scoring system is based on 1-plane view per area, using the most representative view to capture pathology, with any abnormal findings confirmed in a second plane.
Since all views reached excellent reliability18 after the second training session, the differences in the ICC noted after the first scoring exercise may have resulted from variability in the experience level of the participants interpreting MSUS rather than intrinsic issues with the scoring system. The lower ICC levels and 95% CI range in the MCP and proximal interphalangeal (PIP) volar views in PD mode for the expert group was a function of the number of positives rating being small. Reliability coefficients assume an equal distribution of positive and negative findings.24 The cut-offs used for differentiating the various levels of agreement (poor to excellent) were based on Cicchetti.18 Other authors (Koo and Li25) have proposed slightly higher cut-offs. It is important to note that no universal agreement exists on how to define these levels. The ICC ranges between 0.00 and 1.00, with values closer to 1.00 representing stronger reliability. Given that most of our ICC values are 0.90 or above, with the lower end of the range being very close to it, the results suggest very good reliability independent of the specific cut-offs used. We also considered the lower end of the 95% CI for the decision on whether agreement was good enough; we did not base this decision simply on the ICC value itself.
The value of training sessions, which include a review of the definitions of key MSUS findings in healthy children and in children with arthritis, normal sonoanatomy, the scoring system, sonographic images with pathology, and a calibration session, was demonstrated. Participant feedback following the first and second training exercises noted that careful review of the normal spectrum of sonographic findings related to the degree of skeletal maturation was most helpful. Given the in-person meeting limitations imposed by the coronavirus disease 2019 (COVID-19) pandemic, this project was conducted in an online virtual setting. A major benefit of the change in format included the opportunity for the recording of the sessions and the possibility to reach out to a larger group. The session recordings were used by participants unable to attend the virtual meetings. The excellent reliability reached in this project supports the use of an online virtual format as an effective method for pediatric MSUS training, including training for scoring exercises.
Additional exercises of the reliability of the proposed MSUS scoring systems in real-time US images among different patient age groups, for instance, through patient-based exercise, may follow. Future studies will also need to assess the construct and predictive validity of this preliminary MSUS scoring system.
In conclusion, a novel MSUS scoring system for B mode and PD mode of the pediatric elbow, wrist, and finger showed excellent reliability among pediatric rheumatology ultrasonographers with varying levels of expertise. This was supported by an in-depth virtual training format. This joint-specific scoring system for pediatric arthritis could serve as a clinical and scientific outcome measure, following further refinement and validation.
Footnotes
The authors wish to acknowledge Childhood Arthritis and Rheumatology Research Alliance (CARRA) and the ongoing Arthritis Foundation financial support of CARRA. This project was funded by a CARRA–Arthritis Foundation Small Grant. PVF was supported by the Center for Clinical and Translational Science and Training at the University of Cincinnati, which is funded by the National Institutes of Health (NIH) Clinical and Translational Science Award program (grant 2UL1TR001425-05A1 and KL2 [2KL2TR001426-05A]); the National Institutes of Arthritis and Musculoskeletal Skin Diseases (award no. P30AR076316); and the Diversity and Health Disparities Award, which is funded by Cincinnati Children’s Hospital Medical. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
- Accepted for publication September 13, 2022.
- Copyright © 2023 by the Journal of Rheumatology
This is an Open Access article, which permits use, distribution, and reproduction, without modification, provided the original article is correctly cited and is not used for commercial purposes.