Abstract
Objective Rheumatoid arthritis (RA)-associated interstitial lung disease (ILD) is one of the most common and prognostic organ manifestations of RA. Therefore, to allow effective treatment, it is of crucial importance to diagnose RA-ILD at the earliest possible stage. So far, the gold standard of early detection has been high-resolution computed tomography (HRCT) of the lungs. This procedure involves considerable radiation exposure for the patient and is therefore unsuitable as a routine screening measure for ethical reasons. Here, we propose the analysis of characteristic gene expression patterns as a biomarker to aid in the early detection and initiation of appropriate, possibly antifibrotic, therapy.
Methods To investigate unique molecular patterns of RA-ILD, whole blood samples were taken from 12 female patients with RA-ILD (n = 7) or RA (n = 5). The RNA was extracted, sequenced by RNA-Seq, and analyzed for characteristic differences in the gene expression patterns between patients with RA-ILD and those with RA without ILD.
Results The differential gene expression analysis revealed 9 significantly upregulated genes in RA-ILD compared to RA without ILD: arginase 1 (ARG1), thymidylate synthetase (TYMS), sortilin 1 (SORT1), marker of proliferation Ki-67 (MKI67), olfactomedin 4 (OLFM4), baculoviral inhibitor of apoptosis repeat containing 5 (BIRC5), membrane spanning 4-domains A4A (MS4A4A), C-type lectin domain family 12 member A (CLEC12A), and the long intergenic nonprotein coding RNA (LINC02967).
Conclusion All gene products of these genes (except for LINC02967) are known from the literature to be involved in the pathogenesis of fibrosis. Further, for some, a contribution to the development of pulmonary fibrosis has even been demonstrated in experimental studies. Therefore, the results presented here provide an encouraging perspective for using specific gene expression patterns as biomarkers for the early detection and differential diagnosis of RA-ILD as a routine screening test.
Rheumatoid arthritis (RA)-associated interstitial lung disease (ILD) is one of the most common and prognostic extraarticular organ manifestations of RA, contributing significantly to patient morbidity and mortality.1-3 The time of onset of RA-ILD is highly variable, but is most common in the first 2 to 5 years following RA diagnosis.4 It is vital to recognize RA-ILD at the earliest possible stage to treat it appropriately.
High-resolution computed tomography (HRCT) is currently the most important tool in the diagnosis of RA-ILD for recognizing the pattern of lung involvement.5 However, owing to the radiation exposure, HRCT is inappropriate as a precautionary measure for patients with RA. Recently, it was shown that lung ultrasound could also be a suitable noninvasive and reproducible method for the detection of early RA-ILD, with a sensitivity of 70% and a specificity of 97%.6 However, when used for early diagnosis, this method requires considerable demands in terms of the investigator’s qualifications.
Accordingly, there is a great demand to determine early indicators for the development of RA-ILD (eg, on a genetic or transcriptome level) to identify patients at high-risk for developing RA-ILD and to focus on early detection efforts in this group.
Gene expression analysis from whole blood holds great potential in this regard as an economical and straightforward screening method that is innocuous for the patient and therefore suitable for routine examinations. Here, we hypothesize that the systematic differences in gene regulation between these groups could provide a patient-friendly biomarker for early detection of this dangerous organ manifestation.
METHODS
Samples. Twelve female patients with RA positive for rheumatoid factors (RF) IgM and anticitrullinated protein antibodies (ACPA) were recruited for this study. At this time, all patients were treated with methotrexate (MTX). Five patients had RA without lung involvement, were in Clinical Disease Activity Index (CDAI) remission, and received MTX monotherapy. Seven patients had RA with lung involvement; 1 patient received MTX monotherapy, 3 received MTX + rituximab (of these, 1 patient + nintendanib), 2 received MTX + abatacept, and one received MTX + etanercept. RA-ILD with significant fibrosis was documented by chest HRCT in all 7 patients (5 with usual interstitial pneumonia, and 2 with fibrosing nonspecific interstitial pneumonia pattern). Regarding the expansion over the lungs, all 7 patients showed signs of fibrosis with bilateral lung involvement: 3 patients had lower lung involvement only and 4 patients had upper and lower lung involvement. At the time of recruitment, auscultation revealed sclerosiphonia in 6 patients and vascular breathing sounds in 1 patient.
Library preparation. For RNA sequencing, freshly drawn blood samples were directly collected in PAXgene Blood RNA Tubes (PreAnalytiX) and samples were stored at −20 °C until processing. RNA was isolated with the PAXgene Blood miRNA Kit (PreAnalytiX). The RNA integrity number was determined by the Agilent 2100 Bioanalyzer (Agilent), and RNA quantification was performed with the Qubit 4 (Thermo Fisher Scientific). NEBNext Ultra II Directional RNA Library Prep Kit for Illumina with unique dual index primer pairs (NEB) was used for library preparation. Paired-end sequencing (150 base pair read length; approximately 25 million reads per sample) was performed with Illumina NextSeq 2000 (Illumina) and data were provided as FASTQ files.
RNA-Seq data processing. Paired-end RNA sequencing reads (forward and reverse sequences) from whole blood samples were first trimmed for sequencing adapters and low-quality regions using the command-line tool bbduk.sh from BBMap (v38.86). The splice-aware STAR aligner (v2.7.5a) was used to map the trimmed reads in consideration of mate-pairs onto the human reference genome from GENCODE v38 (ref GENCODE). Quantification of genes based on read counts was done by FeatureCounts (v2.0.1) using the GENCODE annotation v39 that matches the genome version described above. Reference genome and annotation from GENCODE were both downloaded on November 9, 2021. The quality of sequencing and mapping was comprehensively investigated using FastQC (v0.11.9), multiple functions from RSeQC (v3.0.1), and MultiQC (v1.9).
Differential expression analysis. The R package DESeq2 (v1.34.0) was used for the analysis of differential expression. Hemoglobin coding genes were removed from further analysis as they do not contain characteristic information but account for a high percentage of total gene expression (33.04%) and thus may bias the statistics. The filtered gene counts were normalized by size factor based on DESeq2’s median-of-ratios approach and then tested for statistically significant differences in expression by comparing patients with RA-ILD to patients with RA without ILD using the Wald-Test implemented in DESeq2. In other words, for each gene expressed/activated in blood tissue, we calculated the level of expression in all patient samples. To test whether there is a difference in the expression of each gene between patients with RA-ILD and patients with RA without ILD, a statistical test is applied to the samples’ gene counts. The resulting P values describe the probability for each gene that expression is similar between disease groups. The log2 fold change value is the logarithmic ratio of the average counts of individual genes from each group and describes the effect size of the expression difference. The P values were adjusted for multiple testing using the Benjamini-Hochberg method. Genes with an adjusted P value < 0.05 were defined as significantly differentially expressed genes between the considered groups. The ggplot2 R library (v3.3.6) was used for all visualizations.
Ethics approval and informed consent. This study was approved by the Charité Ethics Committee, Campus Benjamin Franklin, application number: EA4/049/18, Charitéplatz 1, 10117 Berlin, Germany. All participants in the study were informed in writing and gave their written consent to participate in the study.
RESULTS
We statistically compared the gene expression patterns of whole blood samples from patients with RA-ILD and patients with RA. The analysis revealed 9 significantly upregulated genes in patients with RA-ILD compared to those with RA without ILD (adjusted P value < 0.05; Figure; Table). These genes comprise arginase 1 (ARG1), thymidylate synthetase (TYMS), sortilin 1 (SORT1), marker of proliferation Ki-67 (MKI67), olfactomedin 4 (OLFM4), baculoviral inhibitor of apoptosis repeat containing 5 (BIRC5), membrane spanning 4-domains A4A (MS4A4A), C-type lectin domain family 12 member A (CLEC12A), and the long intergenic nonprotein coding RNA (LINC02967; Table). Except for SORT1, these genes all show log2 fold change values > 1. ARG1 was expressed 1.3-fold higher in patients with RA with ILD compared to those without ILD.
Differential gene expression results. (A) The LFC is plotted against the −log10 adjusted P value for each gene. Genes with positive LFC values are upregulated and those with negative LFC values are downregulated in RA-ILD compared to RA without ILD. All genes with an adjusted P value < 0.05 are colored in red. (B) The normalized gene counts of all DEGs are shown as boxplots for patients with RA-ILD (red) and RA without ILD (blue), separately. ARG1: arginase 1; BIRC5: baculoviral inhibitor of apoptosis repeat containing 5; CLEC12A: C-type lectin domain family 12 member A; DEG: differentially expressed genes; ENSG00000274173: long intergenic nonprotein coding RNA 2967 (LINC02967); ILD: interstitial lung disease; LFC: log2 fold change; MK167: marker of proliferation Ki-67; MS4A4A: membrane spanning 4-domains A4A; OLFM4: olfactomedin 4; RA: rheumatoid arthritis; SORT1: sortilin 1; TYMS: thymidylate synthetase.
Differential gene expression analysis results between patients with RA-ILD and RA without ILD including LFC and adjusted P value per gene.
These RNA-Seq results were validated using reverse transcription quantitative PCR (see Supplementary Material, available with the online version of this article).
DISCUSSION
Nine significantly upregulated genes between the disease groups were identified, most of which have been shown to be involved in important processes associated with RA, RA-ILD, and fibrosis.
ARG1 encodes the enzyme arginase, which is mainly part of the urea cycle in liver cells. The arginine metabolism is known to be involved in critical innate and adaptive immune response.7 ARG1 is considered an important contributor to fibrosis and a connection with the quartz dust-induced fibrogenesis in the lungs (silicosis) of mouse models was shown in a study from 2015.8
TYMS encodes thymidylate sythase, which methylates desoxyuridine monophosphate to synthesize thymidine monophosphat (dTMP).7 A certain amount of dTMP is essential for DNA repair and replication, and loss of dTMP therefore causes DNA damage. For downregulation of cell proliferation, TYMS is used as a target for several drugs (ie, MTX for RA treatments).9 In our investigation, both patients with RA-ILD and RA were treated with MTX, which suggests a less effective influence of MTX on TYMS transcription inhibition in patients with RA-ILD compared to patients with RA without ILD. TYMS has also been shown to be upregulated in patients with rare congenital pulmonary fibrosis, such as familial pulmonary fibrosis and Hermansky-Pudlak syndrome. Thus, TYMS appears to play a role in fibrogenesis of the lung.10
SORT1 encodes for the protein sortilin, which is involved in intracellular trafficking. Several cellular disorders were associated with sortilin dysfunction.7 Sortilin plays a key role in the pathophysiology of fibrosing diseases such as calcivic aortic valve disease11 and liver fibrosis.12
OLFM4 encodes for olfactomedin 4. Olfactomedin 4 has shown to be involved in multiple cellular functions as well as innate immune system and inflammatory response.7 OLFM4, among other genes, is upregulated in patients with severely advanced idiopathic pulmonary fibrosis compared to patients with milder advanced diseases.13 This observation supports the potential of peripheral blood transcriptome analysis for the differentiation of RA and RA-ILD.
MKI67 appears to have an important profibrotic function, as suggested from an experimental study on bleomycin-induced pulmonary fibrosis in a mouse model.14 In another mouse model, BIRC5 was also described as an important profibrotic mediator that is upregulated in pathogenetic pulmonary fibroblasts and is thought to play a key role in pulmonary fibrogenesis.15
MS4A4A encodes a tetraspan surface molecule and is mainly expressed in leukocytes, especially macrophages, playing critical roles in the activation of proinflammatory immune responses when infected.7 MS4A4A was found to be differentially expressed in systemic sclerosis (SSc)-associated fibrotic lung (ie, SSc-ILD) and appears to be involved in a network of important genes in SSc-ILD.16
CLEC12A encodes the C-type lectin domain family 12 member A. Besides roles in cell adhesion and cell-cell signaling, CLEC12A is expressed in alveolar macrophages and has been suggested to be involved in the pathogenicity of fibrosing lung disease.17
Virtually all genes upregulated in RA-ILD vs RA without ILD in our investigation are involved in the pathogenesis of fibrosis.
Because of the small number of patients analyzed (n = 12), the results presented here can be understood only as a fundamental proof-of-concept study. Since we first wanted to show the basic feasibility of a differential gene expression analysis in patients with RA and RA-ILD from peripheral blood, we discriminated exclusively between ACPA- and RF-positive RA and ACPA- and RF-positive RA-ILD, but not between patterns of lung involvement within the small RA-ILD group (n = 7). However, all patients were in remission in relation to the arthritis. The usage of RNA-Seq analysis has already been described in other studies, such as for estimating the extent of idiopathic pulmonary fibrosis.13
In conclusion, here we provide the first evidence, to the best of our knowledge, that peripheral blood cell analysis, using RNA-Seq analysis, is conceptually suitable for the detection of lung involvement. Using transcriptome analysis, we consider the gene product as a promising biomarker and diagnostic target for detection of RA-ILD. Despite the small number of cases, these results are encouraging in terms of improving early detection of RA-ILD, determining whether appropriate intensification of therapy is necessary, and even when to consider antifibrotic therapy at the onset of fibrosing RA-ILD, and not only rely on HRCT to morphologically demonstrate progressive pulmonary fibrosis. We therefore propose that performing further systematic gene expression analysis that includes RA-ILD registries, RA-ILD patterns, RA therapy, and time of first manifestation of RA-ILD, is required.
ACKNOWLEDGMENT
We thank Denise Seyler (MTA, Institute for Human Genetics, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany) for her help with the technical processing of the samples and isolating the RNAs.
Footnotes
This work was funded with €5000 as part of the grant “92378 INNOVATE Förderpreis” by UCB-Pharma GmbH, Germany.
The authors declare no conflicts of interest relevant to this article.
- Accepted for publication November 9, 2023.
- Copyright © 2024 by the Journal of Rheumatology
This is an Open Access article, which permits use, distribution, and reproduction, without modification, provided the original article is correctly cited and is not used for commercial purposes.