Abstract
Objective. To test the reliability of Outcome Measures in Rheumatology Clinical Trials (OMERACT) consensus-based ultrasound definitions for normal and vasculitic temporal and axillary arteries in patients with giant cell arteritis (GCA) and in controls.
Methods. A preliminary 1-day meeting and a full 3-day meeting fulfilling OMERACT Ultrasound Group guidelines were held. Temporal and axillary arteries were examined at 2 timepoints by 12 sonographers on 4 patients with GCA and 2 controls. The aim was to test inter- and intrareader reliability for normal findings, halo sign, and compression sign. In both meetings, patients had established GCA. Pathology was more recent in the full meeting, which was preceded by 6 h of training. Scanning time was 15–20 min instead of 10–13 min.
Results. In the preliminary exercise, interreader reliabilities were fair to moderate for the overall diagnosis of GCA (Light κ 0.29–0.51), and poor to fair for identifying vasculitis in the respective anatomical segments (Light κ 0.02–0.46). Intrareader reliabilities were moderate (Cohen κ 0.32–0.64). In the main exercise, interreader reliability was good to excellent (Light κ 0.76–0.86) for the overall diagnosis of GCA, and moderate to good (Light κ 0.46–0.71) for identifying vasculitis in the respective anatomical segments. Intrareader reliability was excellent for diagnosis of GCA (Cohen κ 0.91) and good (Cohen κ 0.71–0.80) for the anatomical segments.
Conclusion. OMERACT-derived definitions of halo and compression signs of temporal and axillary arteries are reliable in recent-onset GCA if experienced sonographers (> 300 examinations) have 15–20 min for a standardized examination with prior training and apply > 15 MHz probes.
- GIANT CELL ARTERITIS
- ULTRASOUND
- RELIABILITY
- DIAGNOSIS
- VASCULITIS
Early and accurate diagnosis of giant cell arteritis (GCA) is imperative. Failure to accurately diagnose and expeditiously treat GCA may lead to vision loss and other severe ischemic complications, whereas misdiagnosis of non-GCA pathology as GCA leads to inappropriate glucocorticoid use and toxicity. Temporal artery biopsy (TAB) has been the diagnostic test of choice. However, TAB is invasive, and results are not immediately available. Hence it is increasingly being replaced by imaging, which includes ultrasound (US), magnetic resonance imaging (MRI), computed tomography (CT), and 18F-fluorodeoxyglucose positron emission tomography (FDG-PET)1. FDG-PET and CT facilitate the examination of extracranial arteries to confirm the diagnosis of large-vessel GCA and exclude alternative serious pathology. MRI and particularly US can additionally visualize temporal arteries and other superficial cranial arteries.
US is widely available in rheumatology practice. It is patient-friendly, reproducible, and repeatable. Modern US transducers achieve image resolution of 0.1 mm for superficial arteries, which is higher than that of other imaging techniques1. US displays a noncompressible, hypoechoic, most commonly concentric arterial wall thickening (“halo sign”) in acute GCA2,3. Alongside medical history and clinical examination it can be used in fast-track clinics offering appointments for patients within 24 h, to rapidly confirm or exclude the diagnosis of suspected GCA. Two studies have shown a decrease of permanent irreversible vision loss after inauguration of fast-track clinics4,5.
US in all patients with suspected GCA is cost-effective compared to biopsy plus clinical judgment without imaging6. It has a higher sensitivity than TAB regarding the clinical diagnosis, particularly in patients with large-vessel GCA6,7. Several studies have investigated the accuracy, and construct and criterion validity of US in the diagnosis of GCA, including 3 metaanalyses8,9,10. There is a trend to higher sensitivities in newer studies because of better technology and increasing experience. A new metaanalysis including studies until February 2017 revealed a pooled sensitivity of 77% and a pooled specificity of 96% with a positive likelihood ratio of 19 and a negative likelihood ratio of 0.2 for the halo sign in temporal arteries compared to the clinical diagnosis of GCA11.
Nevertheless, issues have been raised regarding the diagnostic performance and reliability of US, thus challenging its overall usefulness in GCA. A recent phase III trial enrolled 37% of its patients on the basis of cross-sectional imaging, although US was not included12. Another phase III trial (ClinicalTrials.gov identifier: NCT02531633) included US as an eligible diagnostic modality. This trial was however prematurely terminated in October 2017 by the sponsor, based on the decision to discontinue development of sirukumab in autoimmune diseases. Recently published European League Against Rheumatism recommendations on imaging in large-vessel vasculitis suggest US as the first imaging modality particularly in patients with suspected predominantly cranial GCA13.
An Outcome Measures in Rheumatology Clinical Trials (OMERACT) US subgroup on large-vessel vasculitis was formed. A Delphi survey based on a systematic literature search arrived at US definitions for normal temporal and axillary arteries, the halo sign, and the “compression sign.” These definitions were tested in a Web-based exercise on still images and videos of normal and vasculitic temporal and axillary arteries. The reliability was excellent, with interreader agreements of 91–99% and mean κ values of 0.83–0.98 for both interreader and intrareader reliability14.
The focus of our study, described herein, is the OMERACT validation process, which tested the inter- and intrareader reliability of the aforementioned definitions for both normal and vasculitic arteries. The real-time patient-based exercises required simultaneous data acquisition and interpretation.
MATERIALS AND METHODS
Study design and setting
A preliminary 1-day meeting was held following the International Symposium on Giant Cell Arteritis, Polymyalgia Rheumatica and Large Vessel Vasculitis in Southend, UK, in March 2016 to test the feasibility and study setting for a patient-based exercise. Lessons learned were implemented in a definitive 3-day exercise in Berlin, Germany, in February 2017, modeled on previous OMERACT Ultrasound Working Group studies for testing patient-based reliability of US in rheumatic diseases15,16,17. The methodology and reporting of the Berlin OMERACT reliability exercise adhered to the recommendations from the Enhancing the Quality and Transparency of Health Research network18 using the Guidelines for Reporting Reliability and Agreement Studies Statement19.
US examination
At each meeting, 12 sonographers individually examined 6 study subjects. All sonographers were previously involved in the development of the consensus-based US definitions. Each sonographer performed bilateral examinations of the common superficial temporal artery, its frontal and parietal branches and of the axillary arteries (i.e., 8 artery segments per patient) in longitudinal and transverse scans applying a binary score for vasculitis US lesions as defined by OMERACT14. The subject was lying on an examination couch in supine position. The head was rotated slightly toward the examiner for examining the left temporal artery and away from the examiner for examining the right temporal artery. The probe was placed in the axilla for examining the axillary artery. After a predetermined time, sonographers rotated to the next station until every sonographer examined all patients/controls. The data were collected immediately after each examination to exclude communication between sonographers. Sonographers were blinded to the study subjects’ diagnosis. They were not allowed to communicate with the patients about signs or symptoms of the disease. None of the examined patients had visibly swollen temporal arteries. An identical examination sequence was repeated later the same day to assess intrareader reliability.
US equipment and settings
Esaote MyLab Twice/Class systems equipped with 6–18 MHz linear array transducers were used in the exercises. In the Berlin meeting, 2 additional Esaote MyLab 8 machines were used. The following settings were applied for the examination of the temporal arteries (axillary arteries): B-mode frequency 18 MHz (14 MHz), image depth 1.5 cm (3 cm), 1 focus point at 0.5 cm (1.5 cm) below skin surface, color Doppler frequency 9 MHz (6 MHz), and pulse repetition frequency 2.5 KHz (3.5 KHz). Sonographers were advised not to change these predefined settings except for adjusting image depth and focus point position for the examination of the axillary arteries, if necessary.
Preliminary meeting
The sonographers received no training on US machines and settings before the exercise. Thirteen minutes were allocated for scanning and scoring the findings for the first round and 10 min for the second round. The limit was set after a discussion about daily clinical practice conditions, where these time frames seemed to be adequate and realistic.
The examined study subjects were chosen by the convenors (BD, WAS), who did not participate in the reliability exercise. WAS, being unblinded to the history and diagnosis of all study subjects and having performed > 5000 scans in suspected GCA over 23 years, examined all study subjects in addition to the other sonographers (independent sonographer) to decide whether arterial segments were exhibiting clear or ambivalent pathology and to store reference images and videos. The examined study subjects were 63–76 years old (mean age 68 yrs). Four of them were females. Four study subjects had GCA consistent with the revised inclusion criteria of the SIRRESTA trial (NCT02531633). These criteria require age ≥ 50 years, erythrocyte sedimentation rate 50 ≥ mm/h, and/or C-reactive protein ≥ 2.45 mg/dl (24.5 mg/l), unequivocal cranial symptoms of GCA and/or polymyalgia rheumatica, and evidence of large-vessel vasculitis by cross-sectional imaging including US if diagnosis is not confirmed histologically. Further, the diagnosis had remained unchanged until the exercise. By the time of the exercise, patients were receiving glucocorticoids for 5 weeks, 2 years, 2 years, and 6 years. One of the 2 controls had an uncommon finding of arteriosclerosis of both axillary arteries.
All sonographers were rheumatologists except one who was in his last year of rheumatology training. Prior to the exercise, 7 sonographers had performed > 300 scans of temporal and axillary arteries before, 2 had performed 101–300 scans, 2 had performed 51–100 scans, and 1 had performed < 20 scans. Five sonographers used US machine types in their institutions similar in manufacturer and price level to the ones used in the exercise.
Full meeting
The meeting included 6 h of practical US training on healthy individuals and patients with GCA, different from those who participated in the exercise, using the machines and settings used in the exercise. In the exercise, 20 min were allocated for scanning and scoring the findings for the first round and 15 min for the second round.
The examined study subjects were chosen and also examined by the convenor (WAS), who did not participate in the reliability exercise. Subjects’ age ranged from 56 to 80 years (mean 68 yrs). Four of them were females. Four study subjects had GCA fulfilling the above-mentioned inclusion criteria. They had been receiving glucocorticoids for 4, 7, and 8 months. The fourth patient had a persistent halo sign of temporal arteries for 4 years despite discontinuation of glucocorticoids. Two controls never had any signs or symptoms of GCA.
Eight of the 12 sonographers had participated in the preliminary exercise. All sonographers were rheumatologists. Eleven sonographers had performed > 300 scans of temporal and axillary arteries before. Two of them had indicated an experience of 101–300 scans at the time of the preliminary meeting. One sonographer had performed 51–100 scans at the time of each meeting. Six sonographers used US machines in their institutions similar in manufacturer and price level to the ones used in the exercise.
Ethics committee approval was obtained from the Berlin Medical Association (Berliner Ärztekammer, Eth-04-17). All patients provided written informed consent prior to participation in our study.
Definitions
The definitions obtained by the Delphi exercise and applied at the Web-based reliability exercise14 were also used in the patient-based reliability exercises:
Normal temporal artery: Pulsating, compressible artery with anechoic lumen surrounded by mid- to hyperechoic tissue. Using US equipment with high resolution, the intima-media complex (IMC) presents as a homogeneous, hypo-, or anechoic echo structure delineated by 2 parallel hyperechoic margins (double-line pattern) may be visible.
Normal axillary artery: Pulsating, hardly compressible artery with anechoic lumen; the IMC presents as a homogeneous, hypoechoic, or anechoic echo structure delineated by 2 parallel hyperechoic margins (double-line pattern), which is surrounded by mid- to hyperechoic tissue.
Halo sign: Homogeneous, hypoechoic wall thickening, well delineated toward the luminal side, visible both in longitudinal and transverse planes, most commonly concentric in transverse scans.
Compression sign: The thickened arterial wall remains visible upon compression; the hypoechogenic vasculitic vessel wall thickening contrasts with the mid-echogenic to hyperechogenic surrounding tissue.
Figures explaining these definitions can be found in the article describing the Delphi process in more detail14 and in another review article20.
Statistical analysis
All sonographers (1–12) evaluated all study subjects (n = 6) in 2 rounds, in a total of 8 anatomical positions (common temporal artery, parietal branch, frontal branch, and axillary artery), taking both sides of the body (right, left) into account. Intra- and interreader reliabilities were calculated using the kappa coefficient (κ). Intrareader reliability was assessed by Cohen κ. Interreader reliability was studied by calculating the mean κ on all pairs (i.e., Light κ)21. Kappa coefficients and the corresponding 95% CI were interpreted according to Landis and Koch: κ values of 0–0.2 were considered poor, 0.2–0.4 fair, 0.4–0.6 moderate, 0.6–0.8 good, and 0.8–1 excellent22. The percentage of observed agreement (i.e., percentage of observations that obtained the same score), prevalence of the observed lesions, and prevalence-adjusted bias-adjusted κ (PABAK) were also calculated23,24. Analyses were performed using R Statistical Software (Foundation for Statistical Computing).
RESULTS
Preliminary meeting
The mean interrater agreement for the overall diagnosis of GCA was 0.73 in round 1 and 0.83 in round 2. It was 0.79 in round 1 and 0.77 in round 2 for identifying vasculitis in the respective anatomical segments. The mean intrarater agreements were 0.82 (0.50–1) for the overall diagnosis of GCA and 0.84 (range 0.58–1) for identifying vasculitis in the respective anatomical segments.
The mean interrater reliabilities were fair to moderate for the overall diagnosis of GCA (Light κ 0.29–0.51) and poor to fair for identifying vasculitis in the respective anatomical segments (Light κ 0.02–0.46). Mean intrarater reliabilities were moderate (Cohen κ 0.32–0.64).
The independent sonographer rated 21 of 36 temporal artery segments (58%) as ambivalent because of minor pathology, such as very small halo size of about < 0.5 mm and incomplete compressibility in some subsegments because of chronic changes in longstanding disease (Figure 1). He considered 4 of 12 axillary arteries (33%) ambivalent including both axillary arteries of 1 control with unusually pronounced arteriosclerosis showing heterogeneous and in part hyperechoic, irregularly delineated, eccentric vessel wall alteration with a diameter of up to 1.7 mm. Only 3 experienced sonographers (> 300 scans) considered the findings in these patients non-GCA in both rounds. There were 65% of sonographers who felt that unfamiliarity with the equipment might have hampered their results of false-positive or negative diagnosis and of intrareader reliability.
Full meeting
The mean interrater agreement for the overall diagnosis of GCA was 0.88 in round 1 and 0.93 in round 2. It was 0.78 (range 0.75–0.83) in round 1 and 0.82 (range 0.79–0.86) in round 2 for identifying vasculitis in the respective anatomical segments. The mean intrarater agreements were 0.96 (range 0.83–1) for the overall diagnosis of GCA and 0.89 (range 0.58–1) for identifying vasculitis in the respective anatomical segments.
The interrater reliability was good to excellent. The mean Light κ was 0.76 in round 1 and 0.86 in round 2 for the overall diagnosis of GCA. The mean PABAK was 0.77 and 0.86 in rounds 1 and 2, respectively. For identifying vasculitis in the respective anatomical segments, the reliability was moderate for the temporal arteries (mean κ 0.46–0.53, mean PABAK 0.49–0.66) in round 1, moderate to good in round 2 (mean κ 0.5–0.71, mean PABAK 0.58–0.72), and moderate for the axillary arteries in both rounds (mean κ 0.64–0.66). The intrareader reliability was excellent for the diagnosis of GCA (Cohen κ 0.91, PABAK 0.92) and good (Cohen κ 0.71–0.80, PABAK 0.73–0.81) for the respective anatomical segments.
The independent sonographer rated 14 of 36 temporal artery segments (39%) and none of the 12 axillary arteries as ambivalent due to minor pathology because of chronic changes in longstanding disease. All sonographers agreed in both rounds that the controls had no GCA. Agreement was also 100% in both rounds for the diagnosis of GCA in 3 patients with GCA. Disagreement occurred only when 5/12 and 3/12 sonographers missed the diagnosis of GCA in rounds 1 and 2, respectively, in 1 obese patient with bilateral axillary artery vasculitis, very small residual artery lumen, pronounced collateral flow, and normal temporal arteries (Figure 2).
In both exercises, reliabilities did not significantly differ whether halo sign or compression sign was evaluated. The detailed results are shown in Table 1, Table 2, and Table 3.
DISCUSSION
The inter- and intrarater reliabilities for performing US of temporal and axillary arteries in patients with GCA and controls were good to excellent for the diagnosis of GCA with experienced sonographers who were familiar with the US equipment.
Better reliabilities attained in the full exercise compared to the preliminary exercise could be explained by the following:
(1) Lack of sonographer training on the US equipment and its settings in the preliminary exercise. Only 42% of sonographers in the preliminary exercise and 50% in the full exercise were using similar equipment in their institutions. Even if a sonographer is familiar with a certain type of machine, experience with the settings is important as these may considerably influence the appearance of the US images.
(2) Only 58% of sonographers in the preliminary exercise had performed > 300 examinations in suspected GCA compared to 92% in the full exercise. The European Federation of Societies for Ultrasound in Medicine and Biology minimum training requirements for rheumatologists performing musculoskeletal US demand a minimum of 300 US examinations for achieving level I competency25. Our current study suggests that this requirement may also apply for temporal and axillary artery US in suspected GCA.
(3) More time was provided for each examination in the full exercise because 67% of sonographers of the preliminary exercise said they felt that time restrictions had hampered the results. An examination time of 15–20 min appears to be optimal for examining temporal and axillary arteries in suspected GCA.
(4) The time frame when performing US is important for image interpretation. In patients with untreated GCA, the pathology is much more pronounced than in patients with longstanding, treated disease. The real-time patient-based reliability exercises, according to an OMERACT algorithm, are faced with this shortcoming, because it is impossible to obtain patients with untreated GCA for these exercises. The disease was more longstanding and pathologies were subtler in the preliminary exercise, with 52% of examined anatomical segments showing ambivalent findings compared to 29% in the full exercise. The sensitivity of temporal artery US decreases rapidly with glucocorticoid treatment. In 1 study, the sensitivity compared to the final clinical diagnosis dropped from 88% in patients who had been untreated or who had received glucocorticoids for not longer than 1 day, to 50% in patients who had been treated for 2 days or longer26. Another study, however, found that a residual halo sign may persist for 8 weeks in half of the patients27. In axillary arteries, US pathology may remain longer, for months and years, but it also decreases over time7. Nevertheless, as halo size decreases and halo echogenicity increases with treatment, it is more difficult to differentiate normal from abnormal findings in treated established GCA. This is probably also the case for histology because giant cells do not persist longer than 6 months28. Arteriosclerosis may be a potential confounder in the mainly elderly GCA population. It is, however, far less common in the temporal and axillary arteries than in the carotid and femoral arteries.
Few studies have yet assessed real-time patient-based reliabilities for US in suspected GCA. As for other indications and other imaging methods, reliability was higher when investigated for only 2 sonographers from the same institution. Agreement of 2 sonographers examining temporal arteries for halo sign, stenosis, and occlusions was 95% for the diagnosis of GCA in 1 study2. In another study, 2 sonographers evaluating the compression sign of temporal arteries disagreed only in 1 of 60 patients29. A single study with multiple sonographers from Spain found excellent reliability with a κ value of 0.85 for interreader reliability and of 0.95 for intrareader reliability after a training workshop30. The reliability in our study may be lower probably because of a tighter protocol.
Our study has limitations. The reliability may depend on the severity of the pathologic findings. Because all patients were receiving glucocorticoid treatment, reliability may have been impaired by ambivalent pathology. The repetition of the examination sequence on the same day may have led to overestimation of intrareader reliability. Although similar US equipment was used, even machines of the same type may exhibit different image features. Our study was performed with current high-quality modern 6–18 MHz probes. Probes for examining temporal arteries should provide frequencies of ≥ 15 MHz31. Probes with frequencies > 20 MHz will further increase resolution and allow reliable measurement of the IMC of temporal arteries32. Very few of the sonographers participating in our study are using these probes. Further, IMC measurement of axillary arteries could have a role in future US protocols in suspected GCA.
These exercises following the OMERACT Ultrasound Group guidelines show that the OMERACT-derived definitions of halo and compression signs of temporal and axillary arteries are applicable in recent-onset GCA with excellent inter- and intrarater reliabilities for the diagnosis of GCA if sonographers are experienced, are provided sufficient time for examination, and are familiar with the US equipment, high frequency probes > 15 MHz, and settings.
Acknowledgment
The authors are indebted to the patients for their participation in the study. We also thank Katerina Achilleos and Kenny Schlüter, who helped organize the workshops in Southend and Berlin, respectively.
Footnotes
The preliminary exercise was funded in connection with the International Symposium on Giant Cell Arteritis, Polymyalgia Rheumatica and Large Vessel Vasculitis in Southend, United Kingdom. The full exercise was funded by a grant from Roche Pharma Germany. Ultrasound equipment was provided for both exercises by Esaote SpA, Genoa, Italy.
- Accepted for publication March 29, 2018.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.