Abstract
Objective. To provide an overview of the role of lung ultrasound (LUS) in the assessment of interstitial lung disease (ILD) in systemic sclerosis (SSc) and to discuss the state of validation supporting its clinical relevance and application in daily clinical practice.
Methods. Original articles published between January 1997 and October 2017 were included. To identify all available studies, a detailed search pertaining to the topic of review was conducted according to guidelines of the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA). A systematic search was performed in PubMed and EMBASE. The quality assessment of retrieved articles was performed according to the Oxford Center for Evidence-based Medicine. The methodological quality of the studies was assessed using the Cochrane Handbook for Systematic Reviews and the Quality Assessment of Diagnostic Accuracy Studies–2 tool.
Results. From 300 papers identified, 12 were included for the analysis. LUS passed the filter of face, content validity, and feasibility. However, there is insufficient evidence to support criterion validity, reliability, and sensitivity to change.
Conclusion. Despite a great deal of work supporting the potential role of LUS for the assessment of ILD-SSc, much remains to be done before validating its use as an outcome measure in ILD-SSc.
Interstitial lung disease (ILD) is a clinical manifestation affecting more than half of patients with systemic sclerosis (SSc)1,2. It may be established within the first 4 years of the disease and is frequently subclinical3,4. Although the severity of ILD varies considerably, it represents the leading cause of death in SSc5,6. Thus, an increased awareness of this complication is a real need, because it may affect prognosis, quality of life, and response to treatment. In particular, a sensitive and accurate method is desirable to detect ILD in its early stages. Early detection of ILD in SSc may improve prognosis and lead to better treatment-related outcomes.
To evaluate the presence of ILD in patients with SSc, there are different available tools in addition to clinical evaluation, including pulmonary functional tests (PFT) and imaging methods.
It has been found that the clinical manifestations were not present in the initial stages of the ILD. Moreover, PFT could be unspecific despite an established ILD7. In this context, imaging may play a key role in the accurate detection of ILD.
Chest radiography has been widely used as a first imaging approach to assess ILD, but its very low sensitivity in early stages limits its current use as an assessment tool for early changes. High-resolution computed tomography (HRCT) is sensitive, and is the most common imaging technique used in the assessment of ILD. It has demonstrated utility for diagnosis, disease activity, and therapy monitoring of ILD8,9. Further, it can detect both early pulmonary changes and subclinical lung involvement8. However, it has limited routine use because of high costs and ionizing radiation, in spite of new-generation HRCT machines that have considerably reduced the radiation dose.
It has been proposed that lung ultrasound (LUS) may have a role for the assessment of ILD in patients with autoimmune rheumatic diseases10,11,12,13,14. The LUS assessment of ILD is determined by the detection and quantification of B-lines, which consist of “comet tails” — artifacts fanning out from the lung surface — generated by the reflection of the LUS beam from thickened subpleural interlobar septa detectable between the lung intercostal spaces.
Despite the growing body of evidence supporting the utility of LUS in ILD, its validity, reliability, feasibility, and a standardized approach have not been thoroughly established. Several authors have developed and published different LUS methods to assess for ILD-SSc, but they are limited to the local clinical settings10,11,12,13.
To validate the use of LUS as an outcome measurement instrument in the evaluation of patients with ILD in rheumatic diseases, an OMERACT (Outcome Measures in Rheumatology) LUS Subtask Force was formed.
The purpose of this paper from this task force is to provide an overview of the potential role of LUS in the assessment of ILD-SSc based on a systematic literature review and to discuss the current evidence and state of validation supporting its clinical relevance and application in daily clinical practice.
MATERIALS AND METHODS
Literature review criteria and search strategy
All relevant literature in the field of LUS for detection of ILD in SSc in the last 20 years has been reviewed. We included original articles about studies in humans published between January 1997 and October 2017. To identify all available studies, a detailed search pertaining to the topic of review was conducted according to PRISMA (Preferred Reporting Items for Systematic reviews and Meta-Analyses) guidelines15.
A systematic search was performed in the electronic databases (PubMed and EMBASE), using the following search terms in all possible combinations: ultrasound, sonography, ultrasonography, interstitial lung disease, interstitial fibrosis, interstitial pulmonary fibrosis, pulmonary fibrosis, systemic sclerosis, and scleroderma. In addition, the reference lists of all retrieved articles were manually reviewed. In cases of missing data, study authors were contacted by e-mail to try to retrieve original data. Two independent authors analyzed each article and performed the data extraction independently. In case of disagreement, a third investigator was consulted.
Discrepancies were resolved by consensus. Titles, abstracts, and complete reports of the included articles were systematically evaluated.
Inclusion and exclusion criteria
Studies that have been performed using LUS in ILD-SSc were included in the present review. We excluded from this review the following nonanalytic types of publications: review articles, articles not published in English, case reports, letters to the editor, comments, editorials, non-human studies, or abstracts from scientific meetings. Retrieved papers were screened to avoid duplicates. Titles, abstracts, and full reports of articles identified were systematically screened regarding inclusion and exclusion criteria.
The quality assessment of retrieved articles was performed according to the Oxford Center for Evidence-based Medicine16. The methodological quality of the studies was assessed using the Cochrane Handbook for Systematic Reviews17 and the QUADAS-2 tool (Quality Assessment of Diagnostic Accuracy Studies)18.
Data extraction
The following data were extracted using a template designed for this study: type and design of the study, number of patients, number of controls, comparative diagnostic methods, and aspects focused on the LUS measurements and technique, outcome domains, measures, content, criterion and construct validity, discrimination, and reliability.
RESULTS
About 300 publications were identified in PubMed and EMBASE databases between January 1997 and October 2017. From the 300 articles identified, after excluding the mentioned nonanalytic types of publications, 12 were finally included for further analysis (Figure 1).
Flowchart of the review.
Table 1 reports included studies, type of study, number of patients enrolled, methods of comparison, and variables analyzed (including LUS scoring systems used).
Included studies, type of study, number of patients enrolled, methods of comparison, and scoring systems used.
General characteristics of included studies
All 12 papers included were observational, cross-sectional, and/or descriptive studies10,11,12,13,14,19–25.
No randomized controlled clinical trials or studies including a cohort followed prospectively or longitudinally to evaluate the progression of ILD were found. Three studies were performed using a control group and 11 studies (92%) used the HRCT as an imaging method comparator (Table 1). A total of 635 patients with SSc were recorded, with a median number of 36.5 patients per study (range 31.5–54.7). There were more women than men (82% vs 18%), with a median of 5.3 years of disease duration. The majority of the patients were white and in the sixth decade of life. In most of the studies, the subtype of SSc and the results of the respiratory tests were not mentioned. More details on the clinical characteristics of the patients included in the review are reported in Appendix 1.
The primary aim of all the studies was to determine the correlation between LUS and HRCT findings in detecting pulmonary fibrosis. In all the 12 included studies the LUS examination was performed by B-mode. No study reported the assessment by power Doppler technique.
Most of the articles (92%) included the B-lines as the main LUS finding for ILD, whereas a smaller number reported on pleural irregularities (Table 1). Several US B-lines scoring systems were reported: some were dichotomous (34%), others quantitative (16%) or semiquantitative (50%).
The US scanning protocol adopted by all the studies was based on the evaluation of intercostal spaces. The patient position was also similar in all studies. The patient was in the supine position for anterior and lateral scan and in a sitting position for posterior or dorsal scan (Table 2).
Technical aspect and characteristics of ultrasound machines.
There was great variability in selecting the transducer for the US lung examination. Linear, convex, and cardiac transducers were used. A frequency of 3.5–5 MHz was generally used for the convex transducer, whereas the frequency varied from 8–11 MHz when the transducer was linear. Finally, only 3 studies reported that the sonographer was blinded to the patient’s clinical data (Table 2).
Quality assessment of retrieved articles
All studies were classified as 2b level of evidence, according to the PRISMA guidelines for levels of evidence.
Ninety-two percent of the studies5,7,8,9,12–18 showed a low risk of bias; only one6 was judged as high risk of bias in the patient’s selection section (Figure 2A). Regarding applicability, all the studies demonstrated low risk of bias (Figure 2B).
Quality assessment of papers. A. Global risk of bias and applicability concerns. B. Risk of bias and applicability concerns for each paper.
Criterion validity/construct validity
Because LUS was never tested against the external gold standard (lung histology) in any previous human study in SSc, it does not meet this aspect of validation. As an alternative, correlations with other validated measures were searched, to estimate the concurrent and convergent validity as surrogates for criterion validity and as indicators of overall construct validity.
A total of 11 studies (92%) applied HRCT as the gold standard; in 7 of these studies (58%) the Warrick score was the HRCT score adopted for the correlation with LUS findings26. Four out of 12 (33.3%) included also the PFT in addition to HRCT as a surrogate gold standard. Accuracy (sensitivity and specificity) data are reported in Table 3.
Validity of the studies included.
All studies demonstrated a positive correlation between LUS B-lines and HRCT in the assessment of ILD. However, these results were not confirmed by a multivariate analysis.
Discrimination
Insufficient data were provided in the analyzed studies to assess the reliability and reproducibility of the LUS in ILD in patients with SSc. Only 3 studies (25%) performed intra- or interobserver reliability including κ coefficient. However, because these few tests indicated reproducibility, it was rated partially validated. None of the studies evaluated the sensitivity to change.
Moreover, no studies aimed to demonstrate the predictive validity regarding prognostic value (Table 4).
Reliability, feasibility, and sensitivity to change.
Feasibility
We found that 2 studies reported the time used to examine the lung by LUS, which may range from 6 to 31 min according to the severity of the ILD or the type of scanning technique (Table 4).
We could not find specific data on the day-to-day issues of feasibility, accessibility, or cost-effectiveness. Currently the number of intercostal spaces examined in the studies is highly variable, ranging from 10 to 72 per patient11,12,13,14. Nevertheless, we found good evidence that LUS was available in medical centers, and the patient/physician acceptability was good.
DISCUSSION
This is the first systematic review addressing validity of LUS as an outcome measure in ILD-SSc, to our knowledge. Current evidence suggests that LUS passed the filter regarding face and content validity and feasibility. However, no validated or robust data allow full confirmation of criterion validity, reliability, and sensitivity to change (Table 3 and Table 4).
There have been interesting initiatives to promote new applications of US in rheumatology27,28,29. Because of the increased competency and experience of the sonographers, and the availability of high-end equipment, preliminary data regarding the applications of US in lung disease are also available.
Overall the literature search showed encouraging results. However, some crucial points should be addressed before using LUS as a validated instrument for the assessment of ILD-SSc. First, no consensual definitions were used for the elementary lesions to evaluate during the examination. Second, we found a lack of information on the LUS procedures of image acquisition. There is a crucial need to standardize the scanning technique and the approach for the LUS assessment of the lung as well as how many areas should be scanned (i.e., how many intercostal spaces should be evaluated). Currently the number of intercostal spaces reported in the studies is highly variable, ranging from 10 to 72 per patient11,12,13,14. Third, there is not a consensus on how to quantify the ILD by LUS — by a dichotomy approach or using quantitative or semiquantitative scoring systems. The problem is that there are different LUS B-lines scoring systems including different cutoffs to interpret the degree of ILD. Fourth, there was no agreement on the measurement to use (i.e., scoring systems), as well as the cutoff of normality. Fifth, there is no consensus regarding what the optimal US transducer is to use in the assessment of the lung. Although small surface probes with frequencies ranging between 3 and 3.5 MHz seemed suitable for this specific purpose, transducers with large surfaces and frequencies between 5 and 7.5 MHz were also used30. Fifth, there were no studies including a cohort in which all newly ILD diagnosed by LUS were followed prospectively or longitudinally to see the longterm development. Finally, in general, the studies offered minimal to no information regarding how well LUS performs in the detection of early ILD. Only 1 study20 was performed in very early SSc patients with mean of disease duration ± SD of 1.9 ± 3.2. The authors reported a sensitivity of 100% for the screening of ILD by LUS. These results may represent the basis to exploring the potential of LUS as a screening tool for the early detection of ILD-SSc. In this light, we recently conducted a study with the aim of determining diagnostic value of LUS in detecting subclinical ILD in 133 patients with SSc. We found that 40.6% of patients with SSc showed LUS signs of subclinical ILD in contrast to healthy controls (4.8%; p = 0.0001). Sensitivity and specificity of US in detecting ILD were 91.2% and 88.6%, respectively31.
This literature review revealed several aspects of LUS that need further validation (criterion/construct validity, reliability, and sensitivity to change), revealing a clear research agenda that needs to be addressed in the near future.
Definite validation of criterion validity of the LUS requires lung histology as a gold standard. To date there are no human studies using histology as the gold standard. However, previous studies performed in animal models showing a good correlation between number of B-lines and water level in pulmonary edema suggested that LUS could be a noninvasive and simple method to detect and quantify ILD in rheumatic disorders32.
Validation of reliability of the LUS in ILD-SSc requires comparisons of repeated LUS assessments performed within a short time by the same investigator (intraobserver variability) and by 2 independent investigators (interobserver variability) at the same time in patients with well-defined ILD-SSc.
To obtain more accurate and reliable information on the sensitivity and specificity, as well as the reproducibility of the lung US, additional studies are needed, which ideally must include a higher number of patients showing a full clinical spectrum of ILD-SSc. Additionally, the type of studies required to assess the validity of lung US regarding the sensitivity to change is longitudinal studies, including patients with ILD-SSc with and without treatment and parallel lung US and HRCT evaluations at different timepoints.
We are aware of limitations associated with the present review: the small number of articles found, and that the results described are based only on published studies in peer-reviewed journals and published in English. Another important limitation of our study is that many of the articles included had a small number of samples (n < 40), which decreases the external validity of the articles included. Finally, studies of LUS assessing other forms of ILD were not included, and including them would have extended the number of suitable papers and provided much information regarding the utility of LUS in other types of ILD.
Despite a great deal of work supporting the potential role of LUS for the assessment of ILD-SSc, too much remains to be done to validate its use as an outcome measure in ILD-SSc. In particular, future research should focus on validity of LUS in detecting ILD in the early stages, its accuracy in assessing the eventual response to therapy, the correct timing of LUS for diagnosis and followup, and its potential in monitoring the progression of ILD-SSc. Additionally, the research agenda should focus on promoting the development of consensus on definitions of elementary LUS lesions for ILD and on protocols of image acquisition as well as quantification of LUS findings for ILD.
APPENDIX 1. Clinical characteristics of the study populations included in the review.
- Accepted for publication June 18, 2019.