Abstract
Objective To assess the construct validity of the novel Outcome Measures in Rheumatology (OMERACT) ultrasound (US) semiquantitative scoring system for morphological lesions in major salivary glands by comparing it with magnetic resonance imaging (MRI) and unstimulated whole salivary flow rates (U-WSFRs) in patients with primary Sjögren syndrome (pSS).
Methods Nine sonographers applied the OMERACT 0-3 grayscale scoring system for parotid (PGs) and submandibular glands (SMGs) in 11 patients with pSS who also had MRIs performed. These were evaluated by 2 radiologists using a semiquantitative 0-3 scoring system for morphological lesions. The agreement between US and MRI and the association between U-WSFRs and imaging structural lesions was determined. A score ≥ 2 for both US and MRI was defined as gland pathology.
Results The prevalence of US morphological lesions in 11 patients with a score ≥ 2 was 58% for PGs and 76% for SMGs, and 46% and 41% for PGs and SMGs, respectively, for MRI. The agreement between OMERACT US scores and MRI scores was 73-91% (median 82%) in the right PG and 73-91% (median 91%) in the left PG, 55-91% (median 55%) in the right SMG and 55-82% (median 55%) in the left SMG. When relations between the presence of hyposalivation and an US score ≥ 2 were examined, agreement was 91-100% (median 83%) in both PGs and 55-91% (median 67%) in both SMGs.
Conclusion There is moderate to strong agreement between the OMERACT US and MRI scores for major salivary glands in patients with pSS. Similar agreement ratios were observed between the higher OMERACT US scores and presence of hyposalivation.
Primary Sjögren syndrome (pSS) is a chronic, autoimmune disease characterized by destruction of exocrine glands, predominantly salivary and lacrimal glands, culminating in keratoconjunctivitis sicca and xerostomia.1 To date, there is no single diagnostic test for pSS and the clinical diagnosis is established using a combination of tests.
Ultrasound (US) is a promising tool in the evaluation of the salivary glands (SGs) for parenchymal changes.2 These changes range from mild inhomogeneity of the glandular tissue to gross cystic, nodular, and fibrosing changes where no normal glandular tissue remains. Changes in the glands are strongly correlated between the right and left parotid gland (PG) and between the right and left submandibular gland (SMG), whereas changes in the PGs and the SMGs are less correlated.3
Since the first US score was introduced by De Vita et al in 1992,4 numerous US scores have been developed for evaluating structural gland lesions in patients with pSS.5-10 Although all these scores differ in number of items scored and number of glands assessed, data on their diagnostic and prognostic performance are largely lacking.11,12 With the aim of developing a widely accepted scoring system, the Outcome Measures in Rheumatology (OMERACT) US subgroup on Sjögren syndrome (SS) developed and validated a consensus-based semiquantitative scoring system for structural lesions of the major SGs in patients with pSS.13,14 The criterion and construct validity of the novel OMERACT US scoring system, however, have not been reported thus far.
Among noninvasive measures that can be employed for assessing the morphology of the major SGs, magnetic resonance imaging (MRI) is generally considered as the reference imaging method.15 The majority of MRI studies in patients with pSS evaluated the PGs with only a few studies assessing the SMGs.15-22 There is a paucity of data regarding the comparative performance of US vs MRI in diagnosing structural SG lesions in patients with pSS and only a few studies have compared an US scoring system with a MRI scoring system.17,23
Saliva secretion is the main function of the major SGs. Since salivary flow rates (SFRs) can be reliably assessed and used as objective criteria, SFR is the preferred method to determine the functional status of SGs.24-26
The primary aim of this study was to assess the construct validity of the novel OMERACT US scoring system for major SG lesions against MRI and whole salivary flow rates (WSFRs) in patients with pSS. The secondary aim was to assess the interrater reliability of the US scoring system.
METHODS
Patients. Eleven consecutive patients with pSS from the rheumatology outpatient department of the Istanbul Marmara University Hospital, who gave informed consent to participate in the study, were included. All patients fulfilled the American College of Rheumatology/European Alliance of Associations for Rheumatology criteria 2016 for SS.27 The main demographic and clinical findings are given in Table 1. Approval for this study was obtained from the Marmara University Faculty of Medicine local ethics committee (09.2016.329). Demographic and disease characteristics data were recorded for each patient at study entry. None of the included patients had tested positive for chronic hepatitis C, HIV infection, or other concomitant systemic and rheumatic autoimmune diseases mimicking pSS (including IgG4-related disease).
Demographic and clinical profile of patients with pSS (n = 11).
Investigators. Nine rheumatologists, all experts in musculoskeletal US with a mean experience of 5 years in scanning patients with SS and members of the OMERACT US subgroup on SS participated in the study. The rheumatologists were from 9 different countries (Denmark, Spain, Italy, Turkey, France, the Netherlands, Czech Republic, Germany, and Slovenia). Two radiologists with at least 10 years of experience in head and neck MRI diagnostics came from Turkey.
US assessment. The patients were examined using 9 General Electric-comparable high-resolution US machines (1 GE Logiq P9 R2.5, 2 Logiq P9 R3, 1 Logiq ER7 PRO, 3 Logiq P5, 1 Logiq S7, and 1 Logiq E BT12), equipped with 6-13 MHz, 8-18 MHz, and 10-22 MHz broadband linear array transducers. Grayscale settings were optimized for every machine before the SG US examinations. Sonographers were not allowed to modify these fixed settings, except for adjusting the depth as needed during SG US assessments.
The US examination was performed with the patient in supine position and included longitudinal and transverse multiplane scans of the PGs and multiplane longitudinal scans, parallel to the inferior border of the mandibular bone, of the SMGs. During the exercise, the patients were assigned to 1 machine and were not moved to another. All sonographers examined all patients.
Grayscale US images of each gland were assessed according to the OMERACT 4-grade semiquantitative US score,13 (ie, grade 0, normal parenchyma; grade 1, mild inhomogeneity without anechoic/hypoechoic areas; grade 2, moderate inhomogeneity with focal anechoic/hypoechoic areas; grade 3, diffuse inhomogeneity with anechoic/hypoechoic areas occupying the entire gland surface or fibrous gland). When longitudinal and transverse scan grading differed, it was agreed to record the highest grade (Figure 1). Gland pathology was defined as an OMERACT US score ≥ 2 and used for correlation analysis against MRI and WSFRs.28
Grayscale US images of parotid and submandibular glands according to the OMERACT 4-grade semiquantitative US score. Grade 0: normal parenchyma; grade 1: mild inhomogeneity without anechoic/hypoechoic areas; grade 2: moderate inhomogeneity with focal anechoic/hypoechoic areas; grade 3: diffuse inhomogeneity with anechoic/hypoechoic areas occupying the entire gland surface or fibrous gland. OMERACT: Outcome Measures in Rheumatology; US: ultrasound.
MRI assessment. The SG MRI imaging was performed prior to the US examination meeting, with a time difference of < 2 weeks. MRI examinations of both PGs and SMGs were performed with a 3.0 Tesla MR imager (Achieva 3.0 T) equipped with surface coils (dStream Head Neck coil, Philips Healthcare). Spin-echo transverse and coronal T1-weighted images (repetition time [TR]/time to echo [TE], 480/10 ms) and fast spin-echo transverse and coronal fat-suppressed T2-weighted images (TR/TE, 4800/75 ms) focused on the PGs and SMGs with 3 mm section thickness. These were added to the standard neck imaging sequences consisting of T1-weighted fat-suppressed images in the axial and coronal planes and T2-weighted fat-suppressed images in the axial and coronal planes. Standard neck protocol used 5 mm section thickness with a 1 mm intersection gap, acquisition matrix of 512 × 358, and field of view of 210 mm. No contrast media was used.
Two radiologists independently assessed the MRI images. The readers were blinded to the clinical data. Differences in opinion between radiologists were resolved by a consensus score in 2 cases.
For the comparison with the US scoring system, the MRI grading system as proposed by Kojima et al21 was slightly modified to 4 grades. The original version of signal intensity on T1- and fat-suppressed T2-weighted images in the PGs and SMGs consisted of 5 grades, which were defined as follows: grade 0 (definitely normal, homogeneous), grade 1 (probably normal, almost homogeneous), grade 2 (probably abnormal, slightly heterogeneous), grade 3 (clearly abnormal, moderately heterogeneous), and grade 4 (definitely abnormal, severely heterogeneous). Since grade 0 and grade 1 scores both show almost normal parenchyma, we decided to combine these. Therefore, in our semiquantitative 4-grade version, grade 0 represented definitely normal, grade 1 slightly heterogeneous, grade 2 clearly abnormal, and grade 3 severely heterogeneous destroyed parenchyma (Figure 2). Gland pathology was defined as an MRI score ≥ 2 and was used for correlation analysis.
MRI grading of heterogeneous signal-intensity distribution on T1- and fat-suppressed T2-weighted images in the parotid and submandibular glands. MRI: magnetic resonance imaging.
Collection of whole saliva and measurements of SFRs. Both stimulated WSFRs (S-WSFRs) and unstimulated WSFRs (U-WSFRs) were measured to assess the SG function. The time interval between SG imaging and the evaluation of SFRs was < 2 weeks. U-WSFRs of patients were measured in the morning (9:00-11:00 AM) for standardization. Patients were asked to refrain from eating, drinking, or smoking for a minimum of 2 hours before saliva collection. They were seated in a relaxed position and trained to avoid swallowing saliva. Patients were asked to lean forward and let the saliva flow spontaneously for 15 minutes into a graduated test tube for U-WSFRs.
After this procedure, paraffin chewing for saliva stimulation was used, and S-WSFR was measured. Saliva was collected in a tube shortly after a 5-minute chewing period. Both U-WSFRs and S-WSFRs were calculated as mL/min. Hyposalivation was defined as U-WSFR ≤ 0.1 mL/min.24-26,29
Statistical analysis. Statistical analysis was performed using SPSS version 28.0. Descriptive data and variables were presented as mean (SD) or percentages. MRI and OMERACT US scores were computed according to the presence of a gland score ≥ 2 and = 3 in PGs and SMGs, respectively.26 Regarding the US examination, a total of 198 PG (11 patients × 9 ultrasonographers × 2 PGs) and 198 SMG (11 patients × 9 ultrasonographers × 2 SMGs) OMERACT US scores from 9 rheumatologists for 11 patients were used in the analysis. Since the consensus score of 2 radiologists was reported in MRI examinations of 11 patients, a total of 22 PG (11 patients × 2 PGs) and 22 SMG (11 patients × 2 SMGs) MRI measurements from radiologists for 11 patients were used in the analysis.
In patients with hyposalivation (n = 6) defined by U-WSFR (≤ 0.1 mL/min), a total of 108 scores (6 patients × 9 rheumatologists × 2 PGs and 6 patients × 9 rheumatologists × 2 SMGs) of 6 patients were used in the analysis. In addition, a total of 90 marks (9 rheumatologist × 2 PGs or SMGs of 5 patients) scored by 9 rheumatologists from 2 PGs and 2 SMGs of 5 patients without hyposalivation were included in the analysis. For the MRI examinations, a total of 12 (6 patients × 2 PGs) PG and 12 (6 patients × 2 SMGs) SMG measurements of radiologists for 6 patients with hyposalivation were used in the analysis. A total of 10 (5 × 2) PG and 10 (5 × 2) SMG MRI measurements of 5 patients without hyposalivation were included in the analysis.
The prevalence of MRI and OMERACT US scores was assessed against different cut-off points (< 2 points vs ≥ 2 points for both and = 3 points vs < 3 points for both). In addition, we calculated the percent range of agreement between new OMERACT US scores vs MRI for each rater. Further, the percent range of agreement for each rater between new OMERACT US scores vs presence of hyposalivation (U-WSFR ≤ 0.1 mL/min vs > 0.1 mL/min) was determined. The percentage range of agreement between MRI scores and the presence of hyposalivation for each rater was also analyzed.
Interrater reliability of the ultrasonographers was analyzed using Light .30,31 Since the assessments were on an ordinal scale, weighted
calculations (Fleiss-Cohen weights) were used for calculating Light
. Weighted
coefficients were calculated for each pair of assessments among 9 raters (9 pairwise
), and Light
was calculated as the arithmetic mean of these calculations. Minimum and maximum
values were reported along with 95% CI of Light
. The bootstrap percentile method was used to calculate 95% CIs. Not the subsamples, but all pairwise weighted
coefficients (a total of 9 pairwise
) were calculated for each pair of assessments among 9 raters, and these values were used for CI calculations.
Agreement on individual categories of gradings were presented along with overall agreement. values between 0.00 and 0.20 were considered poor, 0.21 and 0.40 as fair, 0.41 and 0.60 as moderate, 0.61 and 0.80 as good, and 0.81 and 1.00 as excellent.31
IBM SPSS Statistics was used for the calculation of Fleiss multirater analysis, and Microsoft Excel (Microsoft Inc.) was used for calculating bootstrapped percentiles of CIs of Light
values.
RESULTS
Patient characteristics. The main demographic and clinical profiles of the 11 patients enrolled are presented in Table 1. The study used 198 PG (11 patients × 9 ultrasonographers × 2 PGs) and 198 SMG (11 patients × 9 ultrasonographers × 2 SMGs) OMERACT US scores from 9 rheumatologists for 11 patients, and 22 PG (11 patients × 2 PGs) and 22 SMG (11 patients × 2 SMGs) MRI measurements from radiologists for 11 patients.
Prevalence of US abnormalities. Using the OMERACT US score, the prevalence of US score ≥ 2 US was 58% for PGs and 76% for SMGs, whereas the prevalence of grade 3 lesions for PGs and SMGs was 26% and 18%, respectively (Table 2).
Prevalence of various scores by US and MRI found in PGs and SMGs.
Prevalence of MRI abnormalities. MRI assessed all glands as abnormal (≥ 1). With regard to the MRI grading, the prevalence of MRI score ≥ 2 for PGs and SMGs was 46% and 41%, respectively. Prevalence of MRI grade 3 was 27% for PGs and 36% for SMGs (Table 2).
WSFRs. The ranges of WSFR were found to be 0-0.66 mL/min in U-WSFRs and 0-2.2 mL/min in S-WSFRs in the whole group. The mean SFRs were 0.11 (SD 0.19) for U-WSFR and 0.54 (SD 0.64) for S-WSFR (Table 1). The definition of hyposalivation was fulfilled in 6/11 patients (54.5%). In patients with hyposalivation, the mean of U-WSFR was 0.01 (SD 0.01) mL/min, and the mean of S-WSFR was 0.14 (SD 0.21) mL/min. In patients without hyposalivation, the mean U-WSFR and S-WSFR was 0.23 (SD 0.22) mL/min and 1.03 (SD 0.59) mL/min, respectively.
In the 6 patients with hyposalivation, the prevalence of structural lesions for OMERACT US scores ≥ 2 was 97% in PGs and 93% in SMGs, whereas the prevalence for OMERACT US score = 3 was 47% in PGs and 32% in SMGs. In addition, the presence of structural lesions for MRI scores ≥ 2 was 83% in PGs and 67% in SMGs, whereas the percentages for MRI scores = 3 was 50% in PGs and 67% in SMGs in patients with hyposalivation (Table 3).
OMERACT US and MRI scores in relationship to hyposalivation defined by unstimulated whole salivary flow rate.
Agreement percentages between OMERACT novel US scores and MRI scores and U-WSFRs. In the right and left PGs, the agreement between OMERACT US scores and MRI scores was 73-91% (median 82%) and 73-91% (median 91%), respectively. In the right SMG, the agreement was 55-91% (median 55%), and in the left SMG, the agreement was 55-82% (median 55%; Table 4). When the relationship between hyposalivation and US scores was looked at, the agreement was 91-100% in the right PG (median 83%) and 91-100% in the left PG (median 83%), along with 55-91% in the right SMG (median 67%) and 55-91% in the left SMG (median 67%; Table 5). Additionally, agreement between hyposalivation and MRI results was 91% for both PGs, 73% for the right SMG, and 82% for the left SMG (Supplementary Table S1, available with the online version of this article).
Agreement percentages between new OMERACT US scores and MRI scores in pSS.
Agreement percentages between new OMERACT US scores and presence of hyposalivation.
US reliability. The overall mean interrater reliability of US for all SGs using the proposed scoring system was moderate (weighted Light 0.60, 95% CI −0.04 to 0.84). The level of mean interreader
for bilateral PGs (weighted Light
0.60, 95% CI 0.34-0.84) and bilateral SMGs (weighted Light
0.60, 95% CI 0.10-0.81) was also moderate. For time issues, intrarater reliability was not tested.
DISCUSSION
To the best of our knowledge, this is the first study to assess the construct validity of the OMERACT US grayscale scoring system for pSS by comparing with MRI and WSFRs. The results of our study showed a moderate to strong agreement between the novel OMERACT US scoring system and MRI scores (median 55-91%). Further, we compared both the OMERACT US and MRI scores with the presence of hyposalivation and similarly, moderate to strong agreement was observed with the 2 imaging modalities and hyposalivation (median 67-83%). The agreement between US and MRI scores for morphological lesions was stronger in PGs than in SMGs. Similarly, PGs had higher agreement between hyposalivation and both imaging scores than SMGs did. The current study assessed the interrater reliability in patients with the same reliability as previously reported.14
Previously, in 2000, a very high agreement between US and MRI based on parenchymal inhomogeneity was reported by Makula et al,17 without providing any metrics to support either one of the imaging methods. Notwithstanding the lack of data, their results in terms of the detection of more severe morphological gland lesions seem to support our findings. Contrary to our study, Makula et al assessed only PGs. Takagi et al compared SG MRI and ultrasonography findings in patients with SS across a wide age range in a previous study.32 Correlations were found in both PGs and SMGs between the presence of fat areas on MRI and the presence of hyperechoic bands on US (OR 13.82 and 5.23, respectively, both P < 0.001).
S-WSFR is associated with the severity of PG destruction and fat accumulation in patients with pSS, as detected by MRI and computed tomography.20 Previous studies have also shown that the increase in US scores is associated with a decrease in both U-WSFRs and S-WSFRs.33,34 We previously found that US sum scores of 4 glands as well as unilateral PGs and SMGs were sufficient to predict hyposalivation in patients with pSS assessed by the Hocevar and Milic scoring systems.35
Previously, the relationship between MRI and WSFR in patients with pSS was evaluated by Kojima et al,21 showing that the MRI signal intensity is significantly associated with both U-WSFRs and S-WSFRs in both PGs and SMGs of patients with pSS. They also found that the best cut-off values for diagnostic accuracy was a signal intensity grade of grade 3 for the PGs and grade 2 for the SMGs. Our findings are in agreement with those of Kojima et al. In our study, higher US scores were significantly more frequent in patients with hyposalivation. In addition, we determined a moderate to strong agreement with the 2 imaging modalities and hyposalivation (median 67-83%).
This study has several strengths. First, we evaluated all major SGs that can be reliably visualized, and second, all imaging assessments in this cross-sectional study were performed within a time frame of 2 weeks, thereby limiting the variations of assessments because of clinical variability. Further, we applied the OMERACT consensus-based and validated scoring system for pSS. The current study examined patients with the same reliability as previously reported interrater reliability.
A limitation of our study is that different US machines were used, and although US settings were standardized and optimized prior to the study, this may have influenced the evaluation of the US examination. It is well known that grayscale settings are less prone to variability between machines than Doppler US.36,37 However, the use of different US machines reflects more the real-life scenario. Further, we did not assess intrareader reliability because of time constraints, but all the sonographers are experienced for the SG ultrasonography and intrareader reliability for the same sonographers has been established in a previous study showing 0.81 Light values in patients with pSS.13 Another limitation inherent to the participation of rheumatology experts in SG US is that the results may not be immediately applicable to the general population of rheumatologists who may be less experienced in US. In addition, the sample size is relatively small, but not significantly different from other validity and reliability studies. A previously published atlas of the scoring system for each grade may facilitate this.28 Another limitation could be the evaluation of multiple measures (left and right glands) in each patient. As each gland was scored individually and included analyses separately, we involved 22 parotid (11 × 2) glands and 22 (11 × 2) SMGs in 11 patients with SS in this study.
In conclusion, we have demonstrated that the novel OMERACT US consensus-based scoring system has construct validity with moderate to strong agreement with MRI and the presence of hyposalivation. Further investigations in larger cohorts of patients with pSS may help to gain a better understanding of the implementation of US in the management of patients with pSS.
ACKNOWLEDGMENT
We would like to thank Esaote and AbbVie Pharma for providing the patients with logistic support and for ultrasound machines.
Footnotes
The authors declare no conflicts of interest relevant to this article.
- Accepted for publication October 23, 2023.
- Copyright © 2024 by the Journal of Rheumatology