Abstract
Objective. Use of TA-US for diagnostic investigation of giant cell arteritis (GCA) has been proposed but remains a matter of debate because of the heterogeneous findings. We retrospectively evaluated operating characteristics of temporal artery ultrasonography (TA-US) in a single teaching hospital.
Methods. All subjects with suspected GCA had been seen between 2002 and 2008 and had undergone TA-US with continuous-wave Doppler (until 2004) or color duplex ultrasonography (after 2004), followed within 30 days by a temporal artery biopsy (TAB). TA-US findings were compared with TAB-proven GCA and clinically diagnosed GCA. Results are expressed as sensitivities, specificities, and positive (LR+) and negative likelihood ratios (LR−) of stenoses, occlusions, and the halo sign; for the latter, only color duplex TA-US was considered.
Results. Seventy-seven patients fulfilled the selection criteria; 13 had TAB-proven and 19 had clinically defined GCA. Stenoses/occlusions were seen on 45.5% of TA-US and the halo sign was seen only once (3.2%) in 31 duplex TA-US. Respective sensitivities, specificities, LR+, and LR− for GCA diagnosis (using TAB-proven/clinically defined GCA as reference standards) were 69%/53%, 59%/57%, 1.7/1.2, and 0.5/0.8 for stenoses and/or occlusions, and 17%/10%, 100%/100%, infinite/infinite, and 0.8/0.9 for the halo sign.
Conclusion. The halo sign showed 100% specificity for GCA but only 10%–17% sensitivity. Stenoses/occlusions were of low diagnostic value. These observations suggest that TA-US is neither an effective substitute for TAB nor a reliable screening test to decide which patients can be safely spared TAB.
Ultrasonography of the superficial temporal arteries (TA-US) in the diagnostic investigation for giant cell arteritis (GCA) has generated much interest for almost 3 decades. Early investigators applying Doppler US reported hemodynamic flow abnormalities with stenoses and occlusions of the temporal artery1,2. Later studies using color duplex ultrasonography, which also provided information on vessel anatomy, showed edematous thickening of the arterial wall, called the “halo” sign3. Compared to temporal artery biopsy (TAB), the traditional gold standard, TA-US appeared to be a promising, noninvasive test for diagnosis of GCA4.
Despite extensive investigation, the accuracy of TA-US as a diagnostic or screening test for GCA remains uncertain. Studies have given this test a wide range of performance characteristics including 100% sensitivity5,6,7 or 100% specificity3,8,9. Moreover, it is unclear whether the halo sign alone or any abnormal imaging feature is of diagnostic value. The notable inconsistencies of findings crystallized in a comprehensive metaanalysis of 23 studies, which found significant between-study variability for almost all pooled estimates of the operating characteristics of TA-US for GCA10, thereby questioning the generalizability of its results.
Taking advantage of longstanding access to TA-US in our institution, we evaluated its validity for diagnosis of GCA to further clarify this matter.
MATERIALS AND METHODS
Setting and selection of subjects
Our retrospective study was conducted in a teaching hospital, where all vascular ultrasonography examinations are performed by specialized physicians affiliated with a vascular investigation unit. Routine TA-US for patients with suspected GCA was made available in 19812. Standardized reports for all ultrasonography examinations have been stored in a computerized database since 2002.
For the period January 2002–September 2008, we crosslinked the databases of the vascular investigation unit and the local pathology department to identify all patients who underwent TA-US and/or TAB. Only patients who had both tests were considered eligible for our study’s primary objective. These selection criteria were chosen to enable comparison of TA-US features against TAB findings for all patients and relied on the supposition that only patients who underwent TAB had high clinical suspicion of GCA. Moreover, only patients whose TAB was obtained within 30 days after TA-US were included in the primary analyses.
Routine TA-US protocol
Until May 2004, TA-US relied on continuous-wave Doppler ultrasonography using the pencil probe (5 MHz) of an ATL Apogee 800 sonograph. Starting from June 2004, color-coded, pulsed-wave duplex ultrasonography was introduced with the linear array probe (7.5 MHz) of a Toshiba Aplio sonograph.
Routine TA-US protocol included bilateral insonation of the common superficial temporal artery and its frontal branches, followed as far as possible. The following abnormalities were sought: occlusion (defined as a nonexistent Doppler signal), stenosis (localized acceleration with a ≥ 2-fold peak systolic velocity increase or a dampened velocity curve with a ≥ 2-fold peak systolic velocity decrease, compared with either the upstream segment or contralateral artery), and, once the duplex sonograph was used, halo sign (defined as an anechoic ring ≥ 0.3 mm thick, separating surrounding tissue from the colored arterial lumen, seen in both transverse and longitudinal planes).
Data collection
TA-US and TAB findings and medical charts were reviewed for all patients selected for the primary analyses. To guarantee high-quality data retrieval, TA-US reports were independently reviewed by 1 clinical investigator and 1 physician of the vascular investigation unit. Similarly, the original TAB reports were assessed by 1 clinical investigator and compared with an ad hoc reading of all TAB slides by a pathologist, who remained blinded to the original TAB report findings. Between-investigator discrepancies concerning information extracted from the TA-US reports and between the initial and second TAB readings were resolved by consensus discussion. The medical charts of all selected patients were assessed by 1 investigator to control fulfillment of American College of Rheumatology (ACR) criteria11 and to record clinical manifestations, inflammatory markers (erythrocyte sedimentation rate, C-reactive protein) and glucocorticoid therapy at first consultation and during followup, and final diagnoses.
In addition, 2 independent investigators abstracted, and reached consensus on, the TA-US or TAB reports’ principal findings for the patients not eligible for the primary analyses and who underwent TA-US without subsequent TAB or TAB without prior TA-US.
Reference standards definitions
“TAB-proven GCA” was defined based on the presence of a mononuclear cell infiltrate at the intima-media junction or in the media; additional features, e.g., giant cells or disruption of the internal elastic lamina, were considered discretionary. “Clinically defined GCA” referred to the combination of ACR criteria for GCA11, the waning of clinical and laboratory signs under high-dose prednisone, and the absence of other diagnoses over a followup period of ≥ 6 months.
Statistical analyses
Using each of the 2 reference standards, sensitivities, specificities, and positive (LR+) and negative likelihood ratios (LR−) were computed for stenoses, occlusions, and the halo sign; for the halo sign, calculations concerned only the subset of duplex TA-US examinations. LR+/LR− values were calculated based on the respective formulas LR+ = sensitivity/(100 – specificity) and LR− = (100 – sensitivity)/specificity. LR+ values > 10 and LR− values < 0.1 are generally considered to allow a conclusive increase or decrease, respectively, of the posttest likelihood of the presence of the disease under investigation12. Confidence intervals of 95% for sensitivities/specificities were calculated applying the Wilson score method without continuity correction13; for LR+/LR−, 95% CI were computed as described14.
To assess the potential influence of workup bias, we applied the method described by Diamond15, which provides sensitivity and specificity estimates that incorporate information from the patients who underwent the test under evaluation but not the gold standard test. The formula for corrected sensitivity was: PPV/[PPV + (100 – NPV) × (l00 – PA)/PA)] where PPV is the positive predictive value, NPV the negative predictive value, and PA the proportion of abnormal TA-US test results in the entire population. The formula for corrected specificity was: 100 × NPV/[NPV + (100 – PPV) × (PA/[100 – PA])]15.
RESULTS
Patient selection and characteristics
Figure 1 shows patient selection; 77 subjects (mean age at TA-US 72.2 ± 9.1 yrs; 64% women), all referred for suspected GCA, were retained for the primary analyses. For the 2 patients who had sequential bilateral TAB within the selected 30-day window, the second examination findings were used for the analyses. The mean TA-US–TAB interval was 4.8 ± 4.7 days (range 0–21). A total of 13 (16.9%) had a positive TAB and 19 (24.7%) fulfilled the clinical criteria for GCA diagnosis; 1 TAB-positive patient failed to meet ACR criteria. For these 20 patients, the mean time from TA-US to initiation of therapy with high-dose glucocorticoids was 0.4 ± 5.3 days (range −10 to 12).
Computed TA-US operating characteristics
The 77 TA-US subjects included 46 who had Doppler ultrasonography and 31 who had duplex ultrasonography. The same operator (MC) performed 62 (80.5%) studies, and 4 other staff physicians, the remaining 15 (19.5%). Flow abnormalities were identified in 35/77 (45.5%), while an additional (bilateral) halo sign was seen in 1 (3.2%) of the 31 duplex ultrasonographies.
Figure 2 and Table 1 show the breakdown of TA-US findings and the computed TA-US operating characteristics for the study sample for both reference standards. Table 1 also reports the results of a sensitivity analysis, assessing the diagnostic accuracy of bilateral TA-US abnormalities. Sensitivity analyses addressing the operating characteristics of stenoses and/or occlusions evaluated in continuous-wave versus duplex TA-US or stratified by TA-US operators (the most experienced assessor vs the 4 others) did not yield substantially different findings (data not given).
Additional estimates to correct workup bias were calculated for the operating characteristics of stenoses and/or occlusions. These corrected estimates used a 23.8% proportion of abnormal TA-US, i.e., showing stenoses and/or occlusions, in the combined samples of the 77 subjects used for the primary analyses and the 204 subjects who had TA-US without TAB (Figure 1). When assessed against TAB-proven GCA, the “crude” sensitivity and specificity of stenoses and/or occlusions were 69% and 59%, respectively (Table 1). After accounting for workup bias, sensitivity decreased to 46% and specificity increased to 80%. However, the corresponding “corrected” LR+ and LR− values changed little, with adjusted values of 2.24 and 0.68, respectively. Similar findings with small changes were observed for LR+ and LR− values corrected for workup bias, with respective values of 1.33 and 0.91, for stenoses and/or occlusions assessed against clinically defined GCA.
TA-US and TAB findings in patients not eligible for the primary analyses
Among the 204 patients with TA-US but no TAB (210 TA-US: 104 Doppler ultrasonography and 106 duplex ultrasonography), 33 (16.2%) patients had abnormal TA-US findings. Those abnormalities were stenoses and/or occlusions in 32 patients and an isolated (unilateral) halo in 1. Compared to the 77 subjects considered for the primary analyses, the lower abnormal TA-US rate in these patients indicated the possibility of workup bias. In contrast, review of the TAB reports of the 196 patients with no prior TA-US yielded a 16.3% TAB-positivity rate very similar to that found for the 77 subjects used for the primary analyses (Figure 1).
DISCUSSION
Our analysis of 77 subjects with suspected GCA who underwent Doppler or duplex TA-US before TAB credited the halo sign with 100% specificity (with a corresponding infinite LR+) but with very low sensitivity. Also, stenoses and occlusions detected by TA-US appeared to be of little clinical significance, as indicated by their modest LR+ and LR− values. These findings corroborate that, in our institution and in everyday clinical settings, TA-US unfortunately contributes little to diagnosis of GCA.
Surprisingly, a halo sign was seen in only 1 GCA case among the 31 subjects examined by duplex ultrasonography. This resulted in low 17%/10% sensitivities for TAB-proven/clinically defined GCA; however, these estimates must be interpreted with their broad 95% CI in mind. Previous studies reported 9%–100% sensitivity for the halo sign10, and these variations probably mirror interoperator variability of stringency for defining this feature and/or differences in patient selection. In light of the previously described strong concordance between physical examination abnormalities and halo sign16, higher observed halo rates might be explained by study samples skewed toward patients with suspected GCA and clinically prominent temporal artery involvement. Our observed 16.9% prevalence of TAB-positive GCA in a population of biopsied patients agrees with previously reported observations17,18,19,20 and seems to indicate that the selected study sample was representative of the actual target population for which TAB is usually obtained.
In our study, TA-US findings of stenoses and/or occlusions yielded moderate 53%/69% sensitivities and 59%/57% specificities. Those sensitivities are roughly in keeping with the metaanalysis estimate of 68%/66% sensitivities for stenoses and/or occlusions10. In contrast, the metaanalysis showed that visualization of any flow abnormality, when assessed against clinical GCA criteria, reached 95% specificity, but was based on the pooling of only 4 studies10. The limited specificity of stenoses/occlusions found in our study likely reflects that intimal proliferation and fibrosis can also occur in degenerative temporal artery disease21.
A workup bias may have occurred because our primary analyses used only a subset of all the patients undergoing TA-US in our institution over the 7-year study period (Figure 1). This bias, which is common in evaluations of diagnostic tests for which an invasive procedure is the reference standard, originates from the preferential application of the reference standard (here, TAB) to patients with a positive result of the diagnostic test under evaluation (here, TA-US)15,22. In the end, a workup bias inflates the test’s true sensitivity and decreases its true specificity15,22. Thus, because the likelihood measures integrate both sensitivity and specificity, we think that our LR+/LR− estimates were not substantially distorted by a potential workup bias. Adjusting for workup bias shifted uncorrected-to-corrected sensitivity and specificity for stenoses and/or occlusions (against TAB) from 69% to 46% and from 59% to 80%, respectively, but LR+/LR− values were only minimally affected by these changes.
Our observations challenge the usefulness of TA-US as a diagnostic or screening tool for GCA. The failure of studies to consistently demonstrate either high sensitivity or high specificity (or more accurately, high LR+ or low LR−) continues to prevent use of TA-US findings to conclusively rule a diagnosis of GCA in or out. Even if we were to accept our finding and that of other studies3,8,9, suggesting that the halo sign be considered pathognomonic of GCA, in our setting, adopting a diagnostic algorithm based on TA-US would have eventually spared only 1 of 31 TAB. In general, the apparent difficulties in reproducing TA-US performance characteristics from one setting to another probably highlight its high level of operator dependency and entail a considerable drawback to use of this test for diagnosis of GCA.
Acknowledgment
The authors are indebted to Peter Villiger, MD, PhD, for valuable discussions, and to Janet Jacobson for editorial assistance.
- Accepted for publication June 1, 2010.