Abstract
Objective. Forced vital capacity (FVC) and DLCO are used for screening of systemic sclerosis–associated interstitial lung disease (SSc-ILD). The study purpose was to determine the sensitivity, specificity, and negative predictive value (NPV) (proportion of true negative screening tests) of FVC and DLCO thresholds for SSc-ILD on chest high-resolution computed tomography (HRCT) scans.
Methods. Patients fulfilling American College of Rheumatology 2013 SSc criteria with a chest HRCT scan and pulmonary function tests (PFT) were studied. A thoracic radiologist quantified radiographic ILD. Optimal FVC and DLCO % predicted thresholds for ILD were identified using receiver-operating characteristic curves. The FVC and DLCO combinations with greatest sensitivity and specificity were also determined. Subanalysis was performed in patients with positive Scl-70 autoantibodies.
Results. The study included 265 patients. Of 188 (71%) with radiographic ILD, 59 (31%) had “normal” FVC (≥ 80% predicted), and 65 out of 151 (43%) had “normal” DLCO (≥ 60% predicted). FVC < 80% (sensitivity 0.69, specificity 0.73), and DLCO < 62% (sensitivity 0.60, specificity 0.70) were optimal thresholds for radiographic SSc-ILD. All FVC and DLCO threshold combinations evaluated had NPV < 0.70. The NPV for radiographic ILD for FVC < 80% was lower in patients with positive Scl-70 autoantibody (NPV = 0.05) compared to negative Scl-70 autoantibody (NPV = 0.57).
Conclusion. Radiographic ILD is prevalent in SSc despite “normal” PFT. No % predicted FVC or DLCO threshold combinations yielded high NPV for SSc-ILD screening. “Normal” FVC and DLCO in patients with SSc, especially those with positive Scl-70 autoantibodies, should not obviate consideration of HRCT for ILD evaluation.
Interstitial lung disease (ILD) is a leading cause of death in patients with systemic sclerosis (SSc)1, and chest high-resolution computed tomography (HRCT) is the gold standard diagnostic test2. Pulmonary function tests (PFT) including forced vital capacity (FVC) and DLCO % predicted are frequently used to screen for ILD to avoid HRCT-associated radiation. However, evidence shows that a significant number of SSc patients with FVC values within the normal range (≥ 80% predicted) demonstrate radiographic ILD3,4. We hypothesized that optimal individual FVC and DLCO % predicted thresholds, as well as optimal FVC and DLCO combinations, could be identified with acceptably high negative predictive value (NPV) for associated radiographic SSc-ILD. The prevalence of a disease in a population affects the NPV, and radiographic ILD is common in patients with SSc1,3,4,5. One HRCT study of 215 consecutive patients with SSc found limited ILD (< 10% of lung involvement on HRCT) in almost 50% of patients5. However, Suliman, et al3 evaluated 102 consecutive patients with SSc at the University Hospital Zurich, and found that 40 out of 75 patients (53%) with an FVC > 80% predicted had significant radiographic ILD (> 20% lung involvement according to Goh, et al5). Using data from patients with SSc at our center, our aims were to (1) identify optimal individual FVC and DCLO % predicted cutpoints for associated ILD on HRCT, and (2) to determine the sensitivity, specificity, and NPV of varying individual and combined FVC and DLCO % predicted thresholds for associated ILD on HRCT.
MATERIALS AND METHODS
Our present study was approved by the Northwestern University Institutional Review Board (STU00066807). A waiver of informed consent was obtained for this retrospective study because all patients had given consent for the Northwestern Scleroderma Patient Registry, and permission to review electronic health records is included in that consent (STU00002669).
Patients had sine, diffuse, or limited cutaneous SSc, fulfilled American College of Rheumatology 2013 SSc criteria, and had a PFT within 12 months of a chest HRCT scan6. Patients who had pulmonary procedures or diagnoses that could potentially affect PFT results including lung transplant, lobectomy, or lung malignancy were excluded. Patients with comorbid pulmonary hypertension (PH)-ILD were included in the analyses.
Clinical data [SSc subtype, smoking history (current, previous, or never), alcohol use (present or absent), and modified Rodnan skin score] were collected from rheumatology clinic notes within 1 year of HRCT date. SSc disease duration was defined as duration between first non-Raynaud SSc symptom and HRCT scan date. Mean ± SD or n (%) was reported for baseline characteristics.
Chest HRCT is defined by the use of images at 1–2 mm slice thickness reconstructed using specialized computer algorithms to increase the sharpness of lung parenchyma. Ground glass opacities (GGO) and fine reticulation on HRCT are characteristic of SSc-ILD7. Although lung biopsies are seldom performed for diagnosis and survival is independent of histology, nonspecific interstitial pneumonia (NSIP) followed by usual interstitial pneumonia (UIP) are the most common histological patterns in patients with SSc1. An experienced thoracic radiologist (RA), blinded to clinical data, quantified ILD on HRCT according to the method published by Kazerooni, et al and manually reviewed each HRCT to confirm lung changes consistent with ILD as described8. Briefly, each lobe was scored for GGO and fibrosis involvement using a modified Likert scale (0 = no disease; 1 = < 5% involvement of the lobe, 2 = 6–25% involvement; 3 = 26–50% involvement; 4 = 51–75% involvement, and 5 = 76–100%). The total lung score was the sum of the GGO and fibrosis scores for each lobe. ILD was defined as present if the morphologic features of the disease matched a recognized pattern of disease (i.e., NSIP, UIP, etc.).
PFT data obtained at our center or at an outside institution were analyzed. All available FVC values were studied and correlation with total lung capacity (TLC) was determined using Pearson’s correlation coefficient. However, to optimize DLCO data quality, only PFT with inspiratory vital capacity (IVC) and FVC within 0.85 of each other were included9. Patients with poor-quality DLCO data as indicated by an IVC:FVC ratio < 0.85 were excluded from the analyses of DLCO. However, all FVC data were analyzed. Percent predicted DLCO was corrected for hemoglobin values obtained from a complete blood count (CBC) performed within 6 months of the PFT date. National Health and Nutrition Examination Survey III (NHANES III) reference values were used10.
Two independent receiver-operating characteristic (ROC) curves were generated for radiographic SSc-ILD at varying FVC and DLCO % predicted values. For all analyses, R (version 3.4.0) ROCR package was used to determine the optimal cutpoints, defined as the greatest combined sensitivity and specificity, for radiographic SSc-ILD. Various established PFT thresholds were evaluated. Subanalysis was performed according to Scl-70 autoantibody profile, because SSc-ILD is highly prevalent in patients with antitopoisomerase I (Scl-70) autoantibody positivity1.
RESULTS
Of 729 patients enrolled in the Northwestern Scleroderma Program, 404 had interpretable HRCT images for manual review. One hundred thirty-nine patients were excluded: 30 with an overlap autoimmune diagnosis or a confounding pulmonary procedure, and 109 with PFT > 1 year from HRCT date. The remaining 265 studies were analyzed. The DLCO values from 29 PFT were excluded because the IVC to FVC ratio was ≤ 0.85, thereby rendering the DLCO potentially inaccurate9. Of the remaining 236 patients, 214 had hemoglobin assessed within 6 months of PFT for DLCO adjustment (Supplementary Figure 1, available with the online version of this article) and were included for DLCO and combined threshold analyses.
The majority of patients were women (81.5%) and had diffuse cutaneous SSc (49%; Table 1). Antitopoisomerase I serum autoantibodies were present in 78 (30%). The mean SSc disease duration (time between first non-Raynaud SSc symptom to HRCT scan) was 6.4 (range 0–42) years with a median of 2 years. Radiographic ILD was present in 188 subjects (71%). There was a strong linear correlation between FVC and TLC (r = 0.85), thus FVC was used in the remainder of the analyses. Fifty-nine out of 188 (31%) had normal FVC (≥ 80 % predicted). A total of 151 patients had available and accurate DLCO measurement (IVC:FVC within 85% of each other) and radiographic ILD, and 65 of those 151 (43%) had normal DLCO (≥ 60% predicted). Thirty-one out of 214 (14%) of these patients had both FVC and DLCO % predicted within normal range and radiographic ILD. To ensure that patients with poor PFT were not systematically excluded from DLCO analysis, we compared the FVC between the group included versus excluded for low IVC on the DLCO maneuver. Those with DLCO excluded owing to low IVC:FVC ratio did not have a significantly lower FVC than those with acceptable DLCO maneuvers.
Baseline study population characteristics (n = 265).
For associated ILD, the data-derived optimal FVC and DLCO cutpoints were 80% predicted (sensitivity 0.69, specificity 0.73) for FVC, and 62% predicted (sensitivity 0.60, specificity 0.70) for DLCO (Figure 1, Table 2). The combination of DLCO < 62% or FVC < 80% predicted had a sensitivity and specificity for ILD of 0.80 and 0.56, respectively. The sensitivity and specificity for various combinations of traditional and data-derived optimal FVC and DLCO cutpoints for ILD are reported in Table 2. Using an ILD screening algorithm of meeting either FVC < 80% or DLCO < 62% predicted, 82 persons in our cohort (31%) would screen negative for ILD (NPV 0.53). A more liberal algorithm of either FVC < 80% or DLCO < 70% predicted would inappropriately classify 58 persons (22%; NPV 0.57), while an FVC < 80% or DLCO < 80% predicted would inappropriately classify 32 persons (12%) as lacking ILD (NPV 0.65; Table 2).
Receiver-operating characteristic curves for % predicted forced vital capacity (FVC) and DLCO, demonstrating the performance of varying FVC and DLCO % predicted cutpoints for associated radiographic interstitial lung disease in systemic sclerosis.
Performance of pulmonary function test thresholds for prevalent radiographic interstitial lung disease on chest high-resolution computed tomography images in patients with systemic sclerosis.
Subanalysis was performed based on Scl-70 autoantibody status. The NPV for radiographic ILD for FVC < 80% was lower in patients with positive Scl-70 autoantibody (NPV 0.05) compared to those negative (NPV 0.57; Table 2).
DISCUSSION
The study aim was to determine the sensitivity, specificity, and NPV of varying independent and combined FVC and DCLO % predicted thresholds for associated radiographic ILD in SSc. The clinical relevance of new FVC and DLCO threshold values is to inform the development of more rational and data-driven ILD screening protocols that include HRCT in the most at-risk patients. Our ROC curve results show FVC = 80% and DLCO = 62% predicted are the optimal (maximized combined sensitivity and specificity) thresholds for SSc-ILD. Similar area under the curve (AUC) for FVC (AUC = 0.74) and DLCO (AUC = 0.71) indicate that FVC and DLCO are both “moderately accurate” tests for radiographic ILD in SSc patients, with 0.70–0.90 considered “moderately accurate,” and > 0.90 considered “highly accurate”11. We found no individual or combined FVC and DLCO % predicted algorithm that had high NPV for SSc-ILD screening.
Our study population consisted of predominantly white women, reflecting SSc demographics and the patient population who receive care at our center, and thus our results may not be generalizable to men and racially diverse SSc populations. Only 407 out of 729 registry patients had available HRCT imaging and were included in the analysis. This likely enriched our population for patients with pulmonary disease. However, the 71% prevalence of radiographic ILD in our cohort is comparable to reported prevalence rates in other tertiary care cohorts (36–84%)3,4,5. The 30% prevalence of Scl-70 autoantibody positivity in our study cohort is slightly higher than previous reports (20%)1. Yet, ILD is known to be more common in SSc patients with Scl-70 antibodies, rendering HRCT examination referral in this group more likely.
Individual and combined, traditional, and data-derived FVC and DLCO % predicted thresholds had relatively low sensitivity, specificity, and NPV values for associated radiographic ILD. Our data-derived optimal cutpoints for radiographic ILD detection were identical (FVC 80% predicted) or similar (DLCO 62 vs 60% predicted), and did not outperform traditional thresholds of “normal” that are based upon 95% CI from the NHANES III reference population. We examined alternative FVC and DLCO thresholds of 65%, 70%, and 80% predicted and found low NPV for ILD.
Similar to prior study results, “normal” PFT did not discriminate between SSc patients with and without radiographic SSc-ILD. Of our patients with radiographic ILD, defined using the quantitative Kazerooni method, 31% had “normal” FVC (≥ 80% predicted), 43% had “normal” DLCO (≥ 60% predicted), and 14% had both FVC ≥ 80 and DLCO ≥ 60% predicted. Suliman, et al studied 102 patients with SSc and defined radiographic ILD using the more simplistic Goh method that classifies ILD as mild (< 20%), intermediate, or severe (> 20%)3,5. They found that 24 out of 102 patients (23%) had normal FVC (≥ 80% predicted) and radiographic ILD and 40 out of 75 patients (53%) with FVC ≥ 80% predicted had severe radiographic ILD3. Steele, et al evaluated various SSc-ILD screening algorithms and defined ILD as present if (1) crackles were noted on physical examination, (2) radiograph showed interstitial markings consistent with fibrosis, or (3) PFT measures were met: FVC < 70% and FEV1/FVC > 70%; or FVC < 80% and FEV1/FVC > 70%. Both FVC < 70 and < 80% predicted thresholds had low sensitivity for SSc-ILD (54% and 61%, respectively)4.
At traditional thresholds (FVC < 80% and DLCO < 60%), we found FVC compared to DLCO to be more specific for associated SSc-ILD (72% vs 69%). This is most likely due to SSc vascular disease, including concurrent PH, which also results in reduced DLCO12. In a multivariable analysis by Nihtyanova, et al, low DLCO was considered a significant predictor variable in models for both PH and pulmonary fibrosis. In contrast, low FVC was only a predictor for pulmonary fibrosis and not PH13. However, our ROC curve results show that these tests perform similarly for ILD detection.
The association between SSc-ILD and positive Scl-70 autoantibodies is well recognized. The NPV for radiographic ILD of 5% (+Scl-70) versus 57% (−Scl-70) for FVC < 80% predicted and of 10% (+Scl-70) versus 48% (−Scl-70) for DLCO < 60% predicted demonstrates how crucial it is to maintain a high index of suspicion for ILD in patients with +Scl-70 antibodies.
Study limitations include analysis of data from participants recruited at 1 site that specializes in SSc care, which limits generalizability. Also, DLCO was not reported for every patient and the absence of CBC within 6 months prevented correction for hemoglobin, leading to exclusion of that patient from analyses. ILD was dichotomized as present or absent, which prevents determination of the effect of ILD severity–varying PFT thresholds. Finally, we excluded patients with lung malignancy, transplant, and lung resection, and thus limited our evaluation of these patients, who may have severe ILD. Study strengths include our large sample size (265 vs 102 analyzed in Suliman, et al’s cohort3), and our use of stringent criteria to identify high-quality PFT (inclusion of DLCO with hemoglobin performed within 6 mos and exclusion of DLCO values from PFT with IVC:FVC ratio < 0.85), to ensure DLCO quality9. The ILD scoring system also differed between studies as described above. We also add to the literature by examining a variety of cutpoints to determine whether new PFT thresholds exist that could be used to screen for SSc-ILD. Our study is limited to a single center specializing in SSc care, which may limit generalizability. As a cross-sectional analysis, we are also unable to report longitudinal change in PFT.
We show that significant radiographic ILD is present in patients with SSc, especially in those with positive Scl-70 autoantibodies, despite a “normal” PFT. We conclude that clinicians should maintain a high index of suspicion for SSc-ILD despite normal PFT results, and should consider performing a screening chest HRCT examination. However, we recognize that the amount of radiation from 1 chest HRCT (∼3–27 mSv) can exceed the equivalent dose of over 100 chest radiographs (∼0.06–0.25 mSv)14. Additional studies that examine longitudinal change in PFT over time and the presence of radiographic ILD may help to shed light on how clinicians may best monitor for SSc-ILD while avoiding unnecessary radiation exposure.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
- Accepted for publication June 1, 2018.