Abstract
Objective. MicroRNA (miRNA) are short noncoding RNA that regulate genes and are both biomarkers and mediators of disease. We used small RNA (sRNA) sequencing and machine learning methodology to develop an miRNA panel to reliably differentiate between rheumatoid arthritis (RA) or systemic lupus erythematosus (SLE) and control subjects.
Methods. Plasma samples from 167 RA and 91 control subjects who frequency-matched for age, race, and sex were used for sRNA sequencing. TIGER was used to analyze miRNA. DESeq2 and random forest analyses were used to identify a prioritized list of miRNA differentially expressed in patients with RA. Prioritized miRNA were validated by quantitative PCR, and lasso and logistic regression were used to select the final panel of 6 miRNA that best differentiated RA from controls. The panel was validated in a separate cohort of 12 SLE, 32 RA, and 32 control subjects. Panel efficacy was assessed by area under the receiver operative characteristic curve (AUC) analyses.
Results. The final panel included miR-22-3p, miR-24-3p, miR-96-5p, miR-134-5p, miR-140-3p, and miR-627-5p. The panel differentiated RA from control subjects in discovery (AUC = 0.81) and validation cohorts (AUC = 0.71), seronegative RA (AUC = 0.84), RA remission (AUC = 0.85), and patients with SLE (AUC = 0.80) versus controls. Pathway analysis showed upstream regulators and targets of panel miRNA are associated with pathways implicated in RA pathogenesis.
Conclusion. An miRNA panel identified by a bioinformatic approach differentiated between RA or SLE patients and control subjects. The panel may represent an autoimmunity signature, perhaps related to inflammatory arthritis, which is not dependent on active disease or seropositivity.
MicroRNA (miRNA) are small noncoding RNA that are important gene regulators and serve as biomarkers of disease, because gene regulators miRNA can destabilize messenger RNA (mRNA) and block translation typically by binding to the 3′-untranslated region of the mRNA with a complementary seed region near the miRNA 5′ end1,2. MiRNA are found within cells, but also circulate in plasma protected from degradation by exosomes3, microvesicles4, lipoproproteins5, and RNA-binding proteins6. Moreover, miRNA within these bodies can be transported to recipient cells5 to regulate genes. Plasma miRNA are stable in stored samples7 and are more practical for use as biomarkers than miRNA in specific cell types owing to ease of isolation. Many studies, predominantly in cancer, show that miRNA can be helpful diagnostic and prognostic biomarkers, particularly when used in a panel composed of multiple miRNA8,9,10,11.
We and others have found several plasma miRNA that are differentially altered among patients with rheumatoid arthritis (RA)12,13,14. However, most prior studies examined a few miRNA with known relevant function using PCR or small arrays. Small RNA sequencing makes it possible to evaluate many more miRNA in an unbiased fashion. Thus, small RNA sequencing could reveal novel miRNA signatures of RA and potentially provide mechanistic insights into disease pathogenesis. Our objective was to determine whether a panel of miRNA derived from small RNA sequencing could differentiate between patients with RA and control subjects, and if that panel validated in a separate RA cohort and was unique to RA or shared with another autoimmune disease: systemic lupus erythematosus (SLE). Additionally, we used pathway analysis to evaluate whether these miRNA have common disease-related upstream regulators that could affect their expression and whether the miRNA could affect RA-related pathways.
MATERIALS AND METHODS
Overview
We used a discovery cohort of RA and control subjects to perform small RNA sequencing for identification of candidate panel miRNA. Potential candidate miRNA differentiating RA from controls were prioritized using univariable differential expression analysis (R package DESeq2) and a multivariable random forest analysis. These prioritized candidates were validated by quantitative PCR (qPCR). Then, cross-validation lasso with logistic regression was used to further reduce validated miRNA candidates to a small panel of miRNA that provided best discrimination between RA and control subjects based on area under the receiver operating characteristic curves (AUC). This reduced panel of miRNA was then externally validated in an independent cohort of patients with RA and control subjects by qPCR. The panel was also tested in a small number of patients with SLE to see whether it similarly differentiated between SLE and control subjects.
Study population
The discovery cohort included 167 patients with RA and 91 control subjects who were frequency-matched for age, race, and sex from a prior cross-sectional study15. The validation cohort included 32 patients with RA and 32 control subjects who were frequency-matched for age, race, and sex from another prior cross-sectional study16, and 12 patients with SLE from another prior study17.
Recruitment and study procedures were described previously15,16,17 and will be detailed briefly. For the discovery and validation cohorts, subjects were 18 years of age or older. Patients with RA met classification criteria for RA18, patients with SLE met classification criteria for SLE19, and control subjects did not have a diagnosis of inflammatory autoimmune disease. Additionally, in the RA validation cohort, which was originally studied to examine the relationship between RA and structural and functional cardiac abnormalities, these participants were excluded from the study: those with current or prior heart failure, ischemic cardiovascular disease, structural cardiac disease, atrial fibrillation, estimated glomerular filtration rate < 60 ml/min, and gadolinium hypersensitivity, and those who were pregnant or breastfeeding, or unable to have magnetic resonance imaging16. Studies were approved by the Vanderbilt Institutional Review Board (IRB# 000567, 120314, and 990111) and all subjects gave written informed consent.
Clinical and laboratory information
We collected clinical information and laboratory measurements as previously described15,16. RA disease activity was determined by the 28-joint Disease Activity Score (DAS28) using the erythrocyte sedimentation rate (ESR)20. High-sensitivity C-reactive protein (CRP) concentrations and ESR were measured by the Vanderbilt University Medical Center Clinical Laboratory.
Details of sRNA sequencing and microRNA alignments
Total RNA was extracted from stored plasma using Total RNA Purification Kits (Norgen). Libraries were prepared using TruSeq Small RNA Library Preparation Kits (Illumina). RNA extractions and library preparations were performed with both RA and control subject samples in each batch. Libraries were assessed for quality and size-selected for about 128 to 157 nucleotides in length including adaptors by Pippin Prep (Sage Science) in the Vanderbilt Technologies for Advanced Genomics (VANTAGE) core facility.
The cDNA libraries were sequenced using an Illumina NextSeq500 instrument by the VANTAGE core facility. TIGER (Tools for Integrative Genome analysis of Extracellular sRNA), an in-house small RNA sequencing analysis pipeline21, was used to analyze sequence data. In brief, high-quality reads were demultiplexed using Illumina’s CASAVA 1.8 pipeline and 3′ adapters were trimmed using Cutadapt22. Reads shorter than 16 nucleotides after adapter trimming were discarded. Three non-templated nucleotide addition (NTA) isoforms of each sRNA read were generated by removing 1, 2, or 3 bases from the 3′ terminal. All 4 isoforms, including 3 NTA isoforms and original read, were aligned to the human genome (hg19) using Bowtie23, allowing 1 mismatch. An sRNA read was identified as an miRNA if its mapped starting position matched any of the first 3 positions from the 5′ end of an miRNA based on the miRNA genome coordinates from miRBase (version 21, http://www.mirbase.org).
MicroRNA analyses
To obtain a prioritized group of miRNA that were potential candidates for the panel, we used both DESeq224, which is Wald test–based but can adjust for batch effect and other covariates, and random forest analysis, which is a machine learning method, to identify a prioritized list of candidate miRNA for the panel. We assumed that many of the miRNA would overlap, but that using both would help broaden potential candidates.
Sequencing read counts were normalized to total sRNA reads sequenced, which passed quality control (reads per total read). Differentially expressed miRNA were compared by DESeq224, adjusting for age, race, sex, and batch effect. Benjamini-Hochberg–adjusted p values were used to select the 15 most significantly differentially expressed miRNA (based on p value) for qPCR validation and further model development.
Random forest analyses, which allows for nonlinear relationships between disease status and miRNA, were conducted to select miRNA that are most important in separating RA and control subjects. The cross-validated prediction performance of models was evaluated with sequentially reduced number of predictors (ranked by variable importance) by a nested cross-validation procedure. Based on importance score, the top 15 miRNA were selected for qPCR validation and further model development.
Using qPCR-based plasma concentrations of the miRNA, we used lasso regularization with logistic regression to select a parsimonious final miRNA panel that maximized discrimination between RA and controls. This panel of miRNA was validated using qPCR in a separate cohort of 12 patients with SLE, 32 patients with RA, and 32 control subjects. Panel efficacy was assessed by AUC analyses.
qPCR validation
The same plasma samples used for small RNA sequencing were also used for qPCR validation. A cocktail of 3 Caenorhabditis elegans miRNA mimics (cel-miR-39, cel-miR-54, and cel-miR238; Qiagen) was added after the initial lysis step as a spike-in standard for normalizing RNA extraction efficiency. cDNA was prepared using qScript microRNA cDNA synthesis kits (Quantabio). Individual PCR assays (Quanta) and PerfeCTa SYBR green supermix for iQ (Quantabio) were used for qPCR. Plasma miRNA concentrations were determined from standard dilution curves of a DNA mimic and normalized to the spike-in standard. Samples were excluded from analysis if the Ct values of the spike-in standard exceeded 1 Ct from the median.
General statistical methods
Descriptive statistics were calculated as median [interquartile range] for continuous variables, and frequency and proportions for categorical variables. Wilcoxon rank-sum tests were used to compare continuous variables and Pearson chi-square test to compare categorical variables. PCR-based miRNA concentrations were log-transformed in models because of skewness and are presented as geometric mean (95% CI). Fold difference of the PCR-based miRNA concentrations was the fold difference of the geometric mean. Spearman correlation was used to assess the correlation of continuous variables.
Sample size determination
For sRNA sequencing, the discovery cohort sample size of 167 RA and 91 control subjects offered about 99% power to detect miRNA that were > 1.5-fold altered in RA versus control subjects, assuming detection across all samples of about 500 miRNA with a 5% false discovery rate (FDR; https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/). This sample size also gave about 99% power to detect an AUC ≥ 0.65 based on PCR-based plasma miRNA concentrations.
For the validation cohort, a sample size of 32 patients with RA and 32 controls provided about 90% power to detect and AUC ≥ 0.7 based on PCR-based plasma miRNA concentrations.
Pathway analysis
We separately evaluated upstream regulators of the panel miRNA and downstream targets regulated by the panel miRNA using Ingenuity Pathway Analysis (Qiagen, Version 01–07). For evaluation as upstream regulators, we selected direct and indirect regulators of each mature miRNA and its precursor using all available data. For evaluation of downstream targets of the panel miRNA, a target was included in analysis if it was previously experimentally validated or is a highly predicted target of the miRNA. We assessed canonical pathways and functional analyses.
RESULTS
Clinical characteristics
The discovery cohort included 167 patients with RA and 91 control subjects. The groups were of similar age, race, and sex (Table 1). The RA validation cohort included 32 patients with RA and 32 control subjects of similar age, race, and sex (Supplementary Table 1, available with the online version of this article). Compared to the discovery cohort, disease activity was lower (DAS28 score 3.89 vs 2.80 units), and there were fewer seropositive individuals in the validation cohort (69% vs 54% positive for rheumatoid factor). The SLE validation cohort included 12 patients with SLE (Supplementary Table 2).
Significantly altered miRNA based on sRNA sequencing comparing RA versus control subjects: discovery cohort
Among the 262 reliably detected plasma miRNA, 175 were significantly altered in RA compared to control subjects after adjusting for age, race, sex, and batch and FDR. Among these, 110 were ≥ 1.5-fold altered in RA compared to control subjects (Figure 1). Most of these miRNA were increased in RA plasma, and 1 miRNA, miR-3168, was significantly decreased (−1.73-fold decreased, p = 5.0E–03, Padj = 1.2E–02).
The top 15 differentially expressed plasma miRNA as determined each by DESeq2 and by random forest analysis are listed in Table 2. Twelve of the 15 miRNA were common to both analytic approaches (miR-3615, miR-22-3p, miR-502-3p, miR-345-5p, miR-29c-3p, miR-221-3p, miR-140-3p, miR-30e-5p, miR-501-3p, miR-22-5p, miR-127-3p, miR-134-5p). Additionally, 3 miRNA were identified each by DESeq2 only (miR-99b-5p, miR-130a-3p, miR-21-3p) and 3 by random forest only (miR-627-5p, miR-24-3p, miR-96-5p).
PCR validation of altered miRNA
The top candidate miRNA (18 total) were measured by qPCR in the discovery cohort. Two of the miRNA (miR-502-3p and miR-501-3p) were too low in abundance to be assayed reliably by qPCR. All but 2 (miR-127-3p and miR-96-5p) of the remaining miRNA were significantly increased among patients with RA compared to control subjects using PCR-based concentrations of miRNA (Table 2).
MiRNA panel development
Using lasso variable selection with logistic regression of the qPCR-based concentrations of miRNA in the discovery cohort, the following miRNA were chosen for the panel so as to include the fewest miRNA that discriminated RA from control subjects: miR-22-3p, miR-24-3p, miR-96-5p, miR-134-5p, miR-140-3p, miR-627-5p. The panel had an AUC = 0.81 (95% CI 0.75–0.87, p < 0.001) for differentiating RA and control subjects. The panel was similarly robust among those with seropositive RA (AUC = 0.79, 95% CI 0.73–0.86, p < 0.001), seronegative RA (AUC = 0.84, 95% CI 0.77–0.91, p < 0.001), RA in remission (DAS28 score < 2.625, AUC = 0.85, 95% CI 0.78–0.92, p < 0.001), and high RA disease activity (DAS28 > 5.1, AUC = 0.79, 95% CI 0.70–0.88, p < 0.001).
The panel had similar performance across other subgroups of patients compared to control subjects. This included RA patients with disease duration < 1 year (n = 29, AUC = 0.80, 95% CI 0.72–0.89, p < 0.001), those not taking biologic disease-modifying antirheumatic drugs (bDMARD; n = 133, AUC = 0.81, 95% CI 0.75–0.87, p < 0.001), those taking bDMARD (n = 33, AUC = 0.80, 95% CI 0.71–0.88, p < 0.001), those not taking conventional synthetic DMARD (csDMARD) or bDMARD (n = 19, AUC = 0.90, 95% CI 0.83–0.96, p < 0.001), those taking any csDMARD or bDMARD (n = 150, AUC = 0.80, 95% CI 0.73–0.86, p < 0.001), those not receiving any csDMARD, bDMARD, or corticosteroid (n = 13, AUC = 0.89, 95% CI 0.83–0.96, p < 0.001), and those taking either csDMARD or bDMARD or corticosteroid only (n = 153) compared to control subjects (AUC = 0.80, 95% CI 0.74–0.86, p < 0.001).
Validation
The panel of 6 miRNA was robust in the separate RA validation cohort (AUC = 0.71, 95% CI 0.58–0.84, p = 0.004). Similarly, in the validation cohort, the panel differentiated between seropositive (AUC = 0.73, 95% CI 0.58–0.87, p = 0.01) or seronegative RA (AUC = 0.73, 95% CI 0.57–0.89, p = 0.02) versus control subjects.
We additionally measured the panel in 12 patients with SLE and compared model performance to control subjects. The panel also differentiated patients with SLE from control subjects (AUC = 0.80, 95% CI 0.65–0.96, p = 0.001), but was not significantly different comparing patients with SLE to RA (AUC = 0.63, 95% CI 0.44–0.82, p = 0.13).
Relationship between miRNA components of the panel and disease-related variables
Three of the miRNA were weakly associated with RA disease activity by DAS28 score in the discovery cohort (miR-24-3p: Rho = −0.16, p = 0.04; miR-96-5p: Rho = 0.16, p = 0.04; miR-140-3p: Rho = −0.16, p = 0.05); however, these significant associations were not observed in the RA validation cohort.
Upstream regulators of these miRNA in the panel
We examined upstream regulators of the miRNA included in the panel to determine whether there are commonalities in regulation between the miRNA that would promote their ability to be used as an RA miRNA signature. There was little information regarding upstream regulators of miR-627-5p or its precursor at the time of analysis; thus this miRNA was excluded from pathway analysis. The top identified function that the upstream regulators possessed related to invasion of cells (p = 7.89E–27), for which 35 of the upstream regulators were included (Figure 2). Additionally, these upstream regulators are involved in cell death (p = 7.11E–22). The top overlapping canonical pathway was the involvement of macrophages, fibroblasts, and endothelial cells in RA with 11 overlapping molecules (Figure 2).
MiRNA panel pathway targets
Because circulating miRNA can be delivered to cells with functional consequences5 or could reflect cellular processes, we evaluated whether experimentally validated and highly predicted targets of panel miRNA are involved in RA pathways. There was little information regarding miR-134-5p function at the time of analysis, thus this miRNA was excluded from pathway analysis. Among the top canonical pathways, the involvement of osteoblasts, osteoclasts, and chondrocytes in RA (Supplementary Figure 1, available with the online version of this article) and the involvement of macrophages, fibroblasts, and endothelial cells in RA (Figure 3) were third (29 molecules) and sixth (28 molecules), respectively, in the number of molecules in the pathway that the miRNA panel may target. Other top canonical pathways include molecular mechanisms of cancer (52 genes), G-protein coupled receptor signaling (30 molecules), protein kinase A signaling (29 molecules), and axonal guidance signaling (28 molecules).
DISCUSSION
We used plasma sRNA sequencing and bioinformatics approaches to develop a panel of miRNA that reliably differentiates between patients with RA and control subjects. The panel included miR-22-3p, miR-24-3p, miR-96-5p, miR-134-5p, miR-140-3p, and miR-627-5p and was robust across seronegative and seropositive RA and RA of varying disease activity. The panel also differentiated between patients with SLE and control subjects but was not significantly different between patients with RA and SLE, suggesting that this panel represents an autoimmunity signature.
Strong evidence indicates that miRNA as a class of sRNA are important in RA. Dicer and Drosha, which are endoribonucleases involved in cleaving miRNA to their mature form, are upregulated in peripheral blood mononuclear cells (PBMC) of patients with RA26. Moreover, activation of Dicer decreases tumor necrosis factor-α production, suggesting a homeostatic involvement of miRNA in RA. The upregulation of Dicer and Drosha in PBMC from patients with RA likely explains why we observed an overall increase in miRNA in plasma from patients with RA, because many plasma miRNA derive from PBMC and other blood cells27. However, in RA not all cell types have increased expression of Dicer and the associated increased miRNA expression. For example, Dicer and miRNA expression were lower in RA synovial fibroblasts, leading to an exaggerated response to inflammatory stimuli and resistance to apoptosis28. In addition to broad changes in miRNA production in RA, there are several individual miRNA with biologic significance implicated in RA.
Several of the panel miRNA have known associations with RA, either as biomarkers or known biology of RA. For example, miR-22-3p was increased significantly among anticitrullinated protein antibodies–positive individuals who subsequently progressed to RA29. Also, lower concentrations of miR-22-3p before treatment were associated with response to adalimumab at 12 months in a randomized double-blinded placebo-controlled trial of 180 patients with early RA30. MiR-22-3p is decreased in fibroblast-like synoviocytes (FLS) from RA compared to patients with osteoarthritis (OA); lower concentrations of miR-22-3p promoted FLS proliferation31. Thus miR-22-3p may be a driving force for the synovial hyperplasia characteristic of RA. MiR-22-3p also regulates Th17 responses in emphysema32 and drives hyperresponsive B cells in SLE33. Moreover, we found that inhibition of miR-22-3p improves nephritis in a mouse model of SLE (manuscript in preparation).
Plasma concentrations of miR-24-3p were elevated in patients with RA compared to controls in several prior studies, and it is a component of a prior RA plasma miRNA panel, which we and others have proposed12,14. In response to interleukin 6, miR-24-3p increases and promotes plasma cell survival, an effect that could support autoreactive plasma cells in RA34. MiR-24-3p also plays a homeostatic role by dampening inflammation through nuclear factor-ΚB signaling pathway35 and chitinase 3-like-136 in the vasculature.
MiR-140-3p has been most widely studied in OA and participates in cartilage homeostasis37; its concentrations are also decreased in synovial tissue from patients with RA compared to OA38. Intraarticular administration of pre-miR-140 (the precursor for miR-140-3p) reduced arthritis scores in mice with collagen-induced arthritis by way of increasing synovial fibroblast apoptosis and reducing proliferation38.
Three of the panel miRNA (miR-96-5p, miR-134-5p, and miR-627-5p) have not been studied widely in RA. Thus, if we had limited selection of miRNA for the panel to those previously studied in RA, these would have been overlooked. We identified new miRNA because of our methods that used sequencing rather than a preselected panel and random forest analysis to identify candidates, which has not been previously done in RA, to our knowledge. MiR-96-5p remained helpful to the model despite not being overall significantly altered in RA based on PCR data. Both miR-96-5p and miR-134-5p can target the KRAS signaling pathway39,40, which may affect T cell activation thresholds, enabling responses to autoantigens41. Calcitriol-induced expression of miR-627 and miR-627 reduced proliferation of a colorectal cancer cell line42. Exposure of calcitrol to synoviocytes cultured from patients with RA reduced cell proliferation and cytokine production43. It is possible that miR-627 may be part of this mechanism.
Proposed pathways that promote or are altered by the miRNA in the panel
Pathway analysis indicates that upstream regulators of panel miRNA and panel miRNA targets are involved in RA-related disease processes. This observation also provides proof-of-principle for the methodology we used to develop the panel. Use of non-biased techniques, such as random forest analysis, serves to find novel biomarkers and offer some insight into mechanisms of disease.
Limitations
Both discovery and validation cohorts were predominantly patients with established disease (defined as ≥ 6 mos of duration44); thus, the panel may not differentiate individuals with RA at the time of diagnosis or pre-RA states from control subjects. The panel was robust among those with disease duration < 1 year and among those not receiving DMARD, however. We did not design the study to develop a diagnostic panel for RA when other inflammatory autoimmune diseases are under consideration, but initial testing of the panel in a small set of patients with SLE suggests that the panel may represent an autoimmunity signature. Future studies examining the panel in a variety of inflammatory autoimmune diseases at varied levels of activity would be helpful to determine whether these miRNA compose an inflammatory autoimmune disease signature rather than an RA signature. The strong relationship between the miRNA and annotated RA pathways was reassuring, but there are limitations to pathway analysis. In general, RA has more extensive annotated pathways than many other inflammatory autoimmune diseases; therefore it is possible that we do not see as strong a relationship with other inflammatory autoimmune diseases because there are insufficient annotated pathways.
An miRNA panel identified by bioinformatics approaches was able to differentiate between patients with RA and control subjects with reproducibility. Many of the upstream regulators of the miRNA and many of the miRNA targets regulate RA-related pathways. The panel may represent an autoimmunity signature that is not dependent on active disease.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Footnotes
Funded by the Veterans Health Administration CDA IK2CX001269, Arthritis Foundation Delivering on Discovery grant, Alpha Omicron Pi, US National Institutes of Health grants P60 AR056116, P01 HL116263, and CTSA award UL1TR000445 from the National Center for Advancing Translational Sciences.
- Accepted for publication May 1, 2019.