Article Text

Download PDFPDF

The potential use of expression profiling: implications for predicting treatment response in rheumatoid arthritis
  1. Samantha Louise Smith1,
  2. Darren Plant2,
  3. Stephen Eyre1,
  4. Anne Barton1,2
  1. 1Arthritis Research UK Epidemiology Unit, Manchester Academic Health Sciences Centre, School of Translational Medicine, University of Manchester, Manchester, UK
  2. 2NIHR Manchester Musculoskeletal Biomedical Research Unit, Central Manchester Foundation Trust, Manchester Academic Health Sciences Centre, Manchester, UK
  1. Correspondence to Prof Anne Barton, Arthritis Research UK Epidemiology Unit, Manchester Academic Health Science Centre, School of Translational Medicine, University of Manchester, Stopford Building, Oxford Road, ManchesterM13 9PT, UK; anne.barton{at}manchester.ac.uk

Abstract

Whole genome expression profiling, or transcriptomics, is a high throughput technology with the potential for major impacts in both clinical settings and drug discovery and diagnostics. In particular, there is much interest in this technique as a mechanism for predicting treatment response. Gene expression profiling entails the quantitative measurement of messenger RNA levels for thousands of genes simultaneously with the inherent possibility of identifying biomarkers of response to a particular therapy or by singling out those at risk of serious adverse events. This technology should contribute to the era of stratified medicine, in which patient specific populations are matched to potentially beneficial drugs via clinical tests. Indeed, in the oncology field, gene expression testing is already recommended to allow rational use of therapies to treat breast cancer. However, there are still many issues surrounding the use of the various testing platforms available and the statistical analysis associated with the interpretation of results generated. This review will discuss the implications this promising technology has in predicting treatment response and outline the various advantages and pitfalls associated with its use.

  • Rheumatoid Arthritis
  • Treatment
  • Infections

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The potential use of expression profiling: implications in predicting treatment response

The customisation of healthcare, with all decisions and practices being tailored to the individual patient, is the ultimate goal of personalised medicine. In the ideal scenario, an effective drug that will not induce serious adverse events would be selected for each patient, based on an algorithm including a variety of predictive markers that would include demographic data. While personalised medicine remains a distant dream, stratified medicine is a more realistic possibility in the near future. Stratified medicine means identifying groups of patients who are more or less likely to respond to, or develop side effects from, particular treatments, and could potentially result in a migration away from empirically prescribing traditional ‘block-buster’ drugs (one drug fits all) and more towards ‘niche-buster’ drugs (specific populations identified via predictive tests).1 ,2 Since the sequencing of the human genome in 2000,3 it was hoped that understanding the effect of genetic variation on treatment response would usher in the era of stratified medicine, and the field of pharmacogenomics has developed as a result. There are several examples in which genetic and genomic markers have been identified and regarded as important enough to be included in drug labelling for use in different medical fields; in some cases but not all, this genetic and genomic information has had a great impact. For example, abacavir has been used for the treatment of HIV but severe hypersensitivity (which can be fatal) was reported in approximately 5–8% subjects in populations of European descent. The adverse reaction is associated with HLA*B5701 carriage and prospective studies showed that pretreatment screening improved the safety and reduced the discontinuation rates for the drug.4–6 Other useful information included in drug labelling can be related to clinical response variability, genotype-specific dosing and mechanisms of drug action (table 1). Genetic variants would serve as ideal markers of treatment response as they are stable and it is possible to test them easily, cheaply and consistently. In the rheumatology field, for example, testing of thiopurine methyltransferase (TPMT) gene polymorphisms before initiation of therapy with azathioprine is now included in prescribing guidelines from the British Society of Rheumatology and the US Food and Drug Administration.7 ,8 Azathioprine is an immunosuppressive drug commonly used in the treatment of autoimmune diseases. Upon administration azathioprine is metabolised into 6-mercaptopurine, whose mode of action is to impede the proliferation of B cells and T cells. The enzyme responsible for the inactivation of 6-mercaptopurine is encoded by the TPMT gene; however, a large proportion of the population exhibit reduced activity to/deficiency in this enzyme (13% of the Caucasian population).9 Individuals with reduced or deficient TPMT activity can be anticipated by genetic screening before this inability to metabolise azathioprine effectively results in complications such as serious bone marrow toxicity and myelosuppression. Knowledge of the genotype can inform therapy decisions as to whether to initiate the drug at a lower dose (in heterozygotes) or to avoid it altogether (in homozygote carriers).

Table 1

Genomic biomarkers in the context of approved US Food and Drug Administration drug labels

There are several drugs in which the identification of predictors of response would make both a substantial clinical and a health economic impact. An example is the use of biological drugs in the treatment of rheumatoid arthritis (RA), in which 30–40% of patients fail to exhibit a satisfactory response but the drugs impose a high financial burden on national health services, costing approximately £10 000 per patient per year, in the UK.10 This fact, coupled with the risk of serious adverse events, has spurred investigations into identifying genetic predictors of treatment outcome to biological therapies. However, few validated markers have been identified and those genetic markers that have been replicated in different populations account for only a small amount of the variance in response (reviewed by Prajapati et al).11 In fact, the largest genome-wide association study of response to anti-tumour necrosis factor (TNF) biological therapy of RA to date, explained only 20% of the variance in response when clinical, demographic and genetic data were incorporated into the model.12

An alternative but complementary approach in the search for robust predictors of treatment response is the study of RNA signatures; transcriptomics. In order to function, a gene needs to be expressed; transcription of DNA to RNA is the first step in this process followed by the translation of RNA into proteins. The amount of protein-coding RNA present within a cell is therefore reflective of the amount of protein required for cellular functions.13 The transcriptome is defined as the sum of all RNA transcripts present within a cell at any given time. It is extremely dynamic depending on the environment, stage of development and even time of day; by comparison, the genome is relatively constant in most cells (eg, DNA sequences are transmitted from maternal to daughter chromatids with an error-rate of approximately one in 106 during mitosis). Transcriptomics is the study of the transcriptome and therefore provides information on which genes are being actively expressed at any given time. Recently, in an expression quantitative trait meta-analysis of 1469 coeliac patients, it was demonstrated that regions associated with susceptibility to disease are highly enriched in expression quantitative trait loci.14 It was found that of the 38 tested non-HLA loci (which only account for approximately 5% of genetic risk), 53% (20/38) contained single nucleotide polymorphisms with a significant (Spearman p<0.003, false discovery rate 5%) cis expression quantitative trait loci effect in peripheral blood mononuclear cells; this was significantly greater than the number that was expected to occur by chance (22 observed vs 7.8 expected).14 ,15 This observation led to the hypothesis that some regions associated with the risk of coeliac disease could influence disease susceptibility by altering levels of gene expression.14 High throughput techniques have been developed to enable the relative quantification of expression levels for thousands of genes simultaneously, and could be useful in predicting response to therapy because, while identification of genetic variants generally relies on sample sizes ranging into the thousands, significant gene expression differences may be observed with relatively modestly sized cohorts. For example, a study of type 1 diabetes genetic loci, identified an increase of approximately 30% more CD25 expression on the surface of memory CD4+ T-cells in patients carrying protective alleles at the rs12722495 locus, compared with fully susceptible individuals.16 This association was detected with fewer than 200 individuals and yielded a statistically significant p-value of 1.16×10–10. In order to detect a genetic effect in type 1 diabetes at this single nucleotide polymorphism (SNP) locus, with 80% power and at a similar level of significance would require 4600 cases and controls; assuming an OR of 1.64 and a control minor allele frequency of 4%.

Various platforms are available to accurately and quantitatively to measure expression levels of thousands of genes simultaneously. One emerging technology is the use of direct sequencing of RNA transcripts (RNAseq) offering improved sensitivity and hypothesis-free analysis of the whole transcriptome. A major advantage of this technology is that, while previous genome annotation is needed for probe selection in array-based methods, RNAseq negates this by directly sequencing complementary DNA thereby permitting the identification of novel transcripts such as splice variants, pseudogenes and microRNA. Another advantage over microarrays is that RNAseq exhibits very low/non-existent background noise (while microarrays demonstrate very high levels), which vastly improves sensitivity in the detection of low abundance genes and has been shown to be highly reproducible using biological and technical replicates.17 Whilst sequencing costs are continually improving and this will probably become the method of choice in future studies, the majority of high-throughput gene expression data to date has been generated using DNA microarrays. DNA microarrays (or hybridisation arrays) take advantage of the natural ability of a given messenger RNA molecule to bind specifically to (hybridise) the DNA sequence from which it originated.18 Microarrays are small impermeable solid supports (chips) onto which thousands of complementary gene sequences (probes) are immobilised in an ordered fashion using ‘hybridisation probing’;18 during this process nucleic acids (which are fluorescently labelled) are incubated with the chip containing the known probe sequences; only those with complementary sequences will bind to the chip. Bound sequences are then detected using a laser technology that excites the fluorescent tags; the location and intensity of these tags are used to identify and quantitate a particular gene.

Some of the major advantages associated with the use of microarrays are its high throughput nature, ease of data generation and well defined analysis techniques, which allow measurement of the expression levels of tens of thousands of genes simultaneously with high accuracy and sensitivity.19 However, there remain concerns about the reliability and reproducibility of results. For example, although intra-platform measurements can yield highly reproducible datasets, merging of datasets from different platforms can produce markedly different results. In a study by Tan et al20 it was found that the expression levels from three widely used microarray platforms (Affymetrix, Amersham and Agilent) demonstrated significantly low reproducibility, yielding disparate sets of differentially expressed genes. This has therefore hampered the adoption of array-based methods for routine use. Furthermore, high levels of background signal in array datasets may result in decreased sensitivity to detect low abundance transcripts.17 A similar lack of sensitivity may also be observed towards the upper end of the spectrum where very high levels of expression are seen; the limited sensitivity results in a reduced dynamic range, as compared to RNAseq, which has the capability of detecting transcripts over a very large dynamic range (greater than 9000-fold range has been estimated).21 Given that thousands of genes are simultaneously analysed in a microarray experiment, positive results can arise due to chance (false positive, type 1 error).22 It is therefore clear that a standardised method for analysing and interpreting microarray data is needed for successful comparability; researchers are already en-route to reaching this goal as published guidelines are available that have substantially contributed towards this target.23 ,24 Multiple statistical methods have been specifically developed to address these problems directly. These include normalisation methods such as local regression or quantile normalisation to achieve meaningful results for comparison of samples across datasets, background correction to adjust for non-specific hybridisation and multiple hypotheses testing such as false discovery rate, which will control for the expected proportion of false positive associations.19 Key to planning all high quality microarray experiments is good experimental design to control as much as possible for confounding such as batch effects, biological variation and technical variation.25 Effective databases are required to store and manage the large amount of information produced from these experiments.23

However, despite these concerns, several notable successes have already been made towards stratified medicine in the field of oncology, mainly in the transcriptomic arena. For example, several markers have been identified that can predict responsiveness to endocrine therapies in patients with breast cancer.26–30 The first of these was the discovery of the oestrogen receptor (ER).31 Using an ER immunocytochemical assay, it was found that expression of the ER on primary tumour cells, taken from elderly patients with breast cancer, accurately predicted response to tamoxifen (an anti-oestrogen agent). In this prospective study it was found that, of 35 ER-positive patients, 91% responded to tamoxifen (or the disease progress remained static); while in 14 ER-negative patients treated with tamoxifen, one demonstrated partial response while the disease continued to progress in the remaining 13. However, a relapse was confirmed in the one responding patient after 20 months.32 An additional advantage of pre-screening is that tamoxifen has been associated with an increased incidence of benign and malignant uterine lesions; it is therefore recommended (but not mandatory) that pre-screening for the ER marker is carried out to minimise exposure to unnecessary side effects and target those most likely to benefit.33

Another example from oncology again comes from the treatment of breast cancer patients, in which the presence of HER2 (c-erbB-2) is routinely tested for to predict response rates to trastuzumab (Herceptin). The American Society of Clinical Oncology recommends pre-screening of this marker to be performed in all primary breast tumours.34 ,35 HER-2 is a member of the epidermal growth factor receptor family and has been found to be overexpressed in 20–25% of invasive breast tumours; overexpression of this marker results in uncontrollable proliferation of breast cancer cells causing a more aggressive tumour phenotype.36 Trastuzumab is a monoclonal antibody that selectively binds HER2 proteins and is therefore only recommended for patients with HER2-positive tumours.35 ,36 Previous clinical trials have demonstrated that this antibody has no or little effect in HER2-negative cells, supporting the guidelines for its use.35 ,37 Conversely, HER-2 overexpression has been associated with reduced response rates to endocrine therapy, in particular tamoxifen treatment.28

Given the success of transcriptomic approaches to identifying response predictors in the oncology field, similar approaches have been adopted to try to find predictors of response to biological therapies in RA, as summarised in table 2.

Table 2

Summary of the results of gene expression studies to identify predictors of response to biologic therapies in RA.

The findings in table 2 suggest that an interferon (IFN) signature is most commonly associated with clinical outcome in patients receiving biological therapy as identified in three studies. In one of the studies, gene expression profiling was performed on RNA extracted from the peripheral blood of 13 RA patients taken at baseline and 3 and 6-month follow-up periods. Using the Illumina HumanHT beadchip microarrays, comparison of samples taken at 3 months and baseline for each patient identified 154 rituximab-induced genes that exhibited at least a two-fold change in expression levels.49 In subsequent analysis, patients were stratified into responders (n=7) and non-responders (n=6) according to changes in the disease activity score (DAS) at 6 months. Using the set of 154 genes identified previously, cluster analysis was performed to identify differentially expressed genes between the two groups. It was found that a cluster of genes involved in type 1 IFN signalling were significantly increased after a 3-month follow-up in responders; this signature was relatively stable in non-responders.49 The most pronounced difference was a cluster of six genes: RSAD2, IFI44, IFI44L, HERC5, LY6E and Mx1. Using Student's t test to compare the expression of these six genes in responders relative to non-responders (based on changes in the DAS score), a significant p value of 0.049 was obtained. These results suggest that before treatment, future responders exhibit low or null IFN response activity at baseline with subsequent development of response during 3 months of rituximab treatment, whereas non-responders will exhibit an active IFN response at baseline, which remains active throughout treatment.49 These results were in line with previously reported results that commented on the correlation between baseline type 1 IFN levels and clinical outcome.50 In that study it was found that the IFN signature significantly predicted a decrease in the DAS 28-joint score following treatment with rituximab. In the high baseline IFN group, DAS 28-joint scores declined by 0.71±0.8 versus a decrease of 1.6±0.97 in the low baseline IFN group after 12 weeks of therapy (p=0.001).50

A relationship between IFN signatures and clinical outcome has also been reported in RA patients receiving a different biological drug, infliximab. However, in that study, an inverse association was observed; it was reported that an increase in expression of a set of five IFN response genes (OAS1, LGALS3BP, Mx2, OAS2, and SERPING1) was correlated with poor clinical outcome in infliximab-treated patients (p=0.022).43 In summary table 2 clearly demonstrates that all findings to date have been conducted in very small patient cohorts, the smallest carried out in just four patients.39 Furthermore, only half of the studies identified in the table conducted a validation experiment.38 ,43 ,45–47 ,49 It is interesting to note that the IFN signature that has been correlated with clinical response was identified through three of the internally validated studies, thereby strengthening this association.

Apart from the IFN signature, there has been little consistency in reported findings. Possible explanations are the different drug responses studied, the small sample sizes and lack of validation within studies, the different platforms used, a high rate of false positive findings and the different types of tissues/cells and time points analysed. In particular, many studies have investigated expression signatures from whole blood or particular cellular subsets; although most of the genetic signals discovered so far point to the fact that immune cells are involved in the development of RA, there is, as yet, no solid evidence to confirm that the same cells contribute towards whether an individual will respond to a given drug or not. The importance of cell-type specificity is becoming increasingly apparent in gene expression work and is an important consideration during design. For example, it has recently been reported that SELL (l-selectin) demonstrates marked opposing directional effects in monocytes and B cells.51 This cell-type specificity would probably be missed in whole blood samples, especially when investigating low abundance genes, in which any signals generated could well be missed due to saturation from multiple cell populations. However it may not always be feasible to test cell-specific populations as fresh samples are required; furthermore, it is recommended that cell separation should be performed within 8 h of blood collection as cell recovery and function can significantly be affected by a prolonged delay.52 Consideration will also be needed into which specific cell type to investigate.

There is thus, debate as to whether it would be better to test synovial tissue, as this is the site of inflammation, and tissue biopsy approaches have again proved successful in the oncology field. The disadvantage of this approach is that facilities for and expertise in arthroscopic synovial tissue biopsy are not widely available, although ultrasound-guided biopsies are being introduced in some centres.53 The advantage of testing peripheral blood is that it is highly enriched with potential circulating biomarkers and there is the potential for rapid translation into clinical practice if biomarkers of treatment response can be identified.

In summary, transcriptomics is a potentially important tool by which to augment the implementation of stratified medicine in clinical practice. However, future studies will need to be adequately powered, show replication of findings on independent platforms and validation of results in independent sample series before findings are translatable to the clinic. Extensive planning is also instrumental to all successful gene expression studies due to the dynamic nature of RNA molecules themselves; in that expression is highly dependent on different stages of the disease, type of tissue and cell types. Careful and extensive consideration of study design is therefore needed. For example, in treatment response, samples must be taken at the same time points and from the same source to be comparable and yield biologically relevant answers. Furthermore, longitudinal studies may yield more relevant answers compared to cross-sectional studies, in particular by defining a temporal origin of deleterious events and distinguishing causal from consequential effects. In particular, researchers will also need to weigh carefully the advantages/disadvantages of using peripheral blood samples over single cell populations and give consideration as to the specific cell population to investigate.

References

Footnotes

  • Contributors SLS was responsible for reviewing the literature and drafting the first version and subsequent revisions of the manuscript. DP, SE and AB critically appraised the content, edited the manuscript and provided suggestions for improvement.

  • Funding SLS is a PhD student funded by an investigator-initiated award from Pfizer (grant number WS1940162).

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.