Abstract
(Gilles Boire): It was both a pleasure and an honor to present the 2019 Dunlop-Dottridge Lecture. My co-author and I will now discuss benefits and pitfalls of biomarkers developed through emerging techniques, evaluated through the experiential perspective of a seasoned clinician, as they apply to the quest for biomarker identification in rheumatic diseases.
Thousands of papers on biomarkers in rheumatology are published each year, and the numbers are increasing steadily. After decades of relatively simple biomarkers such as autoantibodies and single proteins, we are on the verge of a revolution that is already transforming other areas of medicine: the availability of next-generation biomarkers resulting from the combination of high throughput techniques allowing the collection of thousands to millions of variables at the same time, and their simultaneous analysis by computational tools, yielding various scores, signatures, or subsets to be used for pathogenic studies, biologically based diagnosis, prognosis, and selection of optimal drug regimens. While heralding a new era in rheumatology, these complex biomarkers raise challenges and risks that need to be carefully addressed before their implementation. We will first concentrate on the uses and misuses of current biomarkers to illustrate how the next generation of composite biomarkers will positively and unfortunately, potentially negatively alter our approach to disease pathogenesis, diagnosis, and treatments.
The ABC of Biomarkers
Biomarkers are objectively measured characteristics that evaluate physiological or pathogenic processes, as well as potential indicators of therapeutic responses1. Variables indicative of a patient’s feelings, well-being, or functions, are not considered biomarkers. A biomarker may consist of a single variable or be composite when exploiting input from multiple variables2. In rheumatology, biomarkers contribute to various clinical objectives (Table 1)3.
Biomarkers may give mechanistic, clinical, or therapeutic information4. They come under various forms, such as proteins (frequently antibodies), genetic and epigenetic traits, imaging results, histological findings, cellular responses, gene expression, or microbiome characterization3. Their sources are diverse, ranging from fluids (blood, serum, plasma, urine, saliva, synovial fluid), isolated blood cells, skin, gums, membranes and organs, feces, imaging, and even digital (such as those obtained from a smart watch)3.
The ideal biomarker is safe and easy to measure, sensitive and specific, reproducible, consistent across gender and race, actionable such that it can inform clinical management, and cost-efficient3,4. Many biomarkers are co-correlated and thus their concomitant usage does not add much to the information generated by a single one [e.g., erythrocyte sedimentation rate and C-reactive protein (CRP) to evaluate RA disease activity].
Current Uses (HELP) and Misuses (HARM) of Biomarkers
Current biomarkers are best used to support clinical decisions, especially to support or invalidate hypotheses resulting from careful collection of signs and symptoms. Antibodies are often considered the rheumatologists’ area of expertise. It is thus appropriate to use them to illustrate the good and the bad sides of current biomarkers.
HELP
For example, a 72-year-old woman is referred for presumed polymyalgia rheumatica. She presents with a few weeks of shoulder and hip pain and limitation, the CRP is increased, but you note swelling and tenderness in some small joints of both hands and in wrists. You think of rheumatoid arthritis (RA). If anti-CCP [anticyclic citrullinated peptide] and rheumatoid factor are both strongly positive, you feel very comfortable with the diagnosis.
Similarly, seeing a 68-year-old man with a few weeks of fatigue and altered general health now presenting with severe pulmonary hemorrhage and rapidly worsening kidney failure, you immediately think of vasculitis. A diagnosis of double-positive Goodpasture syndrome based on the presence of both positive antiglomerular basement membrane and antimyeloperoxidase antibodies will prompt the initiation of both plasmapheresis and immunosuppression in an attempt to save the patient’s life.
HARM
When biomarkers take precedence over the clinical presentation, problems arise. Every rheumatologist has seen inappropriate referrals for patients with lower back pain and a positive ANA (antinuclear antibody), or with generalized pain typical of fibromyalgia and a positive anti-CCP. For sure, positive biomarkers may represent a preclinical warning for future disease development, but without appropriate clinical signs and symptoms, they mostly generate unnecessary anxiety and potential harm, without benefits to the patient. Conversely, the absence of positive biomarkers may also pose a significant hurdle to establishing a correct diagnosis. For example, despite persistent distal symmetrical synovitis, the diagnosis of RA may be delayed in the absence of RA-associated antibodies together with a normal CRP.
Sometimes biomarkers lead one to consider the wrong diagnosis. For example, in a 52-year-old woman with fever, weight loss, purpura, proteinuria, and progressive renal insufficiency developing over several weeks, negative blood cultures and positive ANCA (antineutrophil cytoplasmic antibodies) and anti-PR3 antibodies suggest ANCA-associated vasculitis (AAV). However, if the patient also has a heart murmur and low complement levels, features not typical of AAV, alternative diagnoses need to be actively looked for. Circulating Bartonella henselae DNA identified by PCR, a diagnosis later confirmed by serology, fully explains the clinical picture; false-positive ANCA and anti-PR3 are indeed frequent in this infection5. In this case, immunosuppression instead of antibiotherapy would have been detrimental. It is thus critical to carefully account for all clinical findings when interpreting biomarkers’ results6.
Remarkably, biomarkers also have frequently ignored inherent limitations. The first and most obvious is that biomarkers do not holistically represent the actual patients. Indeed, when one wants to identify patients with recent-onset inflammatory polyarthritis who will reach remission over the following 5 years, biomarkers are of little use, while elevated scores of depressive symptoms offer some pertinent information7.
The value of a biomarker may also change over time. Collectively, we have learned, and teach that seropositive RA is more severe than seronegative. However, now that we treat patients earlier and more intensively aiming to remission, outcomes at 1 year are very similar regardless of seropositivity8. Unrelated changes in practice may thus alter the course of disease and make some biomarkers less pertinent; similarly, when a biomarker becomes largely used, it induces changes in diagnosis and treatment that may blunt its original impact on prognosis.
Moreover, some biomarkers may lead to rigid and potentially counterproductive conceptualizations. For example, tentative models of RA pathogenesis through RA-associated antibodies9,10 fail to synthesize all available information, such as data in Table 211. Baseline variables from these 754 consecutive RA patients enrolled into our EUPA (Early Undifferentiated PolyArthritis) cohort are typical of very early RA8,12. Seronegative and seropositive patients had quite similar clinical presentation; their smoking history was alike; one-third of anti-CCP–positive patients did not bear any HLA-DR shared epitope. Further, over the course of the first 5 years of disease, OR for bony erosions was only 1.5 in seropositives versus seronegatives, and the use of biologic DMARD (disease-modifying antirheumatic disease) was numerically higher, but not significantly (not shown). Clearly, antibodies alone do not define nonoverlapping subsets of RA patients according to their clinical presentation, pathogenic process, environmental exposures, or outcomes.
Next-generation Computational Biomarkers
We are entering an era of computational biomarkers originating from data generated by ever more efficient molecular technologies allowing multiplexing and generation of minimally biased biomarkers. Low-cost DNA sequencing allows for full genome sequencing of an individual13, as well as the determination of species of bacteria in a stool specimen, without the need for culture14. Gene expression can be determined in hundreds of individual cells, showing surprising diversity within a single population of cells15. Proteomic techniques can detect simultaneously hundreds of proteins in a single sample16. Epigenetics is revealing the crucial role of histone modifications, noncoding RNA, and DNA methylation in cell fate and functions17. Making sense of such a large volume of data requires advanced statistical methods and techniques to explore and learn the unknown internal structure of big data. These techniques encompass machine learning, neural networks, and clustering techniques without a priori assumptions such as Distributed Stochastic Neighbour Embedding, Uniform Manifold Approximation and Projection, and Principal Component Analysis (PCA). While signatures generated using these methods appear simple, clinicians are not equipped to apprehend how complex these computer-generated biomarkers are and what they mean. The ultimate objective of these composite biomarkers is “systems medicine,” integrating biochemical, physiological, and environmental interactions18. Yet questions remain: will patients benefit from this technology-driven explosion of biomarkers and do we prepare adequately for them?
Next-generation Biomarkers: HOPE and HYPE
Response to treatment
Today, predicting response to MTX (methotrexate) remains impossible. Clinical covariates such as age, smoking status, antibodies, joint counts, and patient evaluations only explained 63% of the area under the receiver operating characteristics (ROC) curve (AUC) to predict MTX nonresponse19. However, gene expression profiles from whole blood RNA showed an overrepresentation of type I interferon pathway genes in nonresponders. This signature explained 78% of the AUC, a significant improvement over clinical assessment19. Recently, combining clinical and genomic variables (specifically, genotypes of adenosine triphosphate–binding cassette transporter implicated in active MTX efflux from cells) yielded a model with an AUC of 80%20, illustrating how various sources of biomarkers can synergize productively.
Disease pathogenesis (HOPE)
There is also hope that having more in-depth biological information will give clues about pathogenesis, subsequently allowing for a more uniform classification of patients and informing treatment options.
We propose using juvenile idiopathic arthritis (JIA) as an example. Current International League of Associations for Rheumatology (ILAR) classification of JIA into 7 subtypes is based on the number of joints affected over the first 6 months, combined with biomarkers and extraarticular manifestations ever present in the patient or in first-degree relatives21. Patients from the same subtype vary in clinical presentations, response to treatments, and outcomes (e.g., remission). Our Canadian pediatric colleagues reported their preliminary observations using 2 cohorts, ReACCh Out (Research in Arthritis in Canadian Children Emphasizing Outcomes) and BBOP (Biologically-Based Outcome Predictors in JIA)22. Using PCA statistical methods to analyze serum cytokine and chemokine expression combined with clinical and biologic variables, they reported 5 patient clusters more homogeneous than ILAR subtypes, both in clinical presentation and outcomes at 6 months.
Similar findings were reported for pediatric SLE (systemic lupus erythematosus) using gene signatures from whole blood RNA-Sequencing23. Seven SLE subgroups with specific combinations of 5 immune signatures were found.
Biologically based disease delineation is still in its infancy. Limitations of the current attempts oscillate around the ad hoc selection of a small number of technology-driven rather than pathogenesis-driven variables, and the use of datasets possibly contaminated by high levels of noise and uninformative variables. These strategies also suffer from small patient populations, variable quality of clinical data, failure to integrate patient-derived variables, and the lack of demonstrated improved clinical outcomes on longterm followup.
HYPE
In addition to the problems associated with current biomarkers, next-generation biomarkers present a number of unique challenges and risks. One such risk is that, being complex and costly to validate, it is tempting to evaluate them more leniently. However, in keeping with recommendations from the Institute of Medicine24, assays, choice of specimens, and criteria used for thresholds and variables selection during algorithm development need to be transparent and rigorously validated. We also need to assess their prognostic value and their ability to predict and impact on clinical outcomes. Finally, we must carefully review their appropriateness and pertinence for use in the clinical situation.
The Example of Intestinal Microbiota: Availability versus Pathogenic Relevance
Strong evidence suggests that gut microbiota influence our immune system25. In murine models of RA, dysimmunity does not develop in germ-free conditions. The impact of gut microbiota on the onset and progression of RA has been reported26. Microbiome composition differentiates RA from non-RA patients, as well as early from established RA. Lay reports on these observations drive patients to try trendy dietetic interventions on intestinal microbiota to improve or even cure their arthritis. Is the science already in?
The colon has some of the highest observed densities of living organisms on earth14. Feces contain 1012 to 1013 bacteria per gram, but also large numbers of archaea, fungi, helminths, and perhaps an even larger number of viruses. For technical and conceptual reasons, bacteria remain the best studied at the moment. These bacteria thrive on undigested fibers, producing a variety of digestion products. The absorption of these products can contribute up to 6–10% of the total energy intake of the host.
Short chain volatile fatty acids such as acetic (2 carbons), propionic (3 carbons), and butyric (4 carbons) acids generated by the intestinal flora modulate gene expression in immune cells and other organs including bone and brain25. Butyrate exerts potent effects on immune cells in vitro and in animal models where it can shape the immune system by altering the epigenome27,28,29. However, the butyrate/host interaction is very complex to analyze in humans since it is influenced by host genetics, diet (source of undigested fibers), butyrate metabolism, and composition of gut microbiota. Butyrate may also have distinct impacts at physiologic versus pharmacologic concentrations28,30.
In humans, the composition of microbiota is modulated by age, mode of birth (cesarean vs vaginal), diet changes, diseases, antibiotic use (recent and remote), travel, and several other factors such as drugs, for example, metformin31. As a result, gut microbiota is highly variable between individuals and may change within a given individual over time.
Potential problems with this approach appear understated. First, we currently rely on the Koch’s postulates stipulating that the prevalence of individual bacteria in a disease state is linked to pathogenesis. However, microbiome pathogenicity may depend on the interaction between multiple taxa. Alternatively, bacteria triggering the onset of immune disease may come and go before the clinical manifestations are recognized. Second, several drugs in common use are partly metabolized in the gut, potentially affecting their efficacy and toxicity as well as gut flora. It is thus important to take drugs into account when considering the correlations of microbiota with clinical outcomes. Third and most important, are stools the right specimen to study or are they studied merely due to ease of collection? Perhaps sampling bacteria residing in the mucus lining the epithelium, closer to immune cells, might be more relevant. Maybe we should study ileal bacteria, distinct from those in feces and more similar to the oral microbiome, since they are closer to Peyer’s patches, where immune cells are matured and concentrated32.
In summary, simple associations of a disease with specific microbes are not sufficient to assume their significant role in pathogenesis.
Specific Risks and Challenges of Next-generation Biomarkers
The next-generation of biomarkers also have risks that come from their complexity and cost. Overfitting is definitively a risk, which can result in predictive models that lack reproducibility across cohorts, over an extended period of time, and under varying treatments. In addition, complex next-generation biomarkers often lend themselves to stratification into a large number of subsets, who may not fit into simplistic disease definitions based mostly on clinical grounds or conventional biomarkers. Their reliance on multiple parameters raises the potential for hidden co-correlations (e.g., microbiota and host genetics), making more difficult their combination with current or other next-generation biomarkers (MultiOmics), with clinical parameters, and with patient-related outcomes.
These complex biomarkers also constitute a potential threat to the healthcare system33. Indeed, their use might increase costs markedly, boosting ordering of additional tests and referrals to specialists. Moreover, people with advanced training in statistics and computer science will be needed to perform quality control on a daily basis and ensure consistency of the testing. Without appropriate validation, their use may increase the number of patients considered at risk for a disease, without guarantees for improved clinical outcomes, and might even be detrimental due to the generated anxiety or to ill-oriented attempts at treatment. Finally, it may be tempting for promoters to use algorithms giving results most beneficial to marketing or commercial purposes rather than to patients33.
Ultimately, these biomarkers frequently require biospecimens that are not routinely collected, reducing the number of existing cohorts available to validate them. In many cases, their interpretation also depends on variables not available clinically or in administrative databases, such as host genetics and type of diet. As a consequence, there is a need for well-designed longterm observational cohorts with complete and high-quality clinical data and biomarker capacity, ideally from several geographic areas with genetic and environmental diversity34. The difficulty of funding these cohorts over the long term needs to be addressed. The alternative is hazardous introduction of incompletely validated tests into the clinical market.
Current biomarkers are relatively simple to use and understand, while generally very helpful. Nonetheless, biomarkers are often wrongly used, sometimes replacing the patient as objective of treatment. The CHOOSING WISELY campaign appropriately reminds us that careful clinical evaluation, both before and after the test, remains essential6.
Next-generation biomarkers resulting from the analysis of big data may certainly improve our understanding of disease pathogenesis and the prediction of treatment response, while informing on causal factors underlying disease progression. As such, they essentially help to subset patients into narrower more homogeneous groups, and are paving the way to personalized medicine tailored to individuals rather than groups.
Yet each next-generation biomarker needs to be evaluated very rigorously. The choice of techniques as well as the characteristics of biospecimens must be validated. Their prognostic value over current methods must be demonstrated, as well as their ability to predict the effects of interventions on clinical outcomes and help monitor the response to therapy. To attain these objectives, well-characterized cohorts followed over a long period are essential. Finally, we will need to evaluate the proposed uses of candidate next-generation biomarkers, to ensure that they are appropriate and result in improved outcomes.
The exciting prospects of next-generation biomarkers come with significant risks. It may be tempting for developers not to be transparent about their choices for thresholds and algorithms35. Past experience has shown that biomarkers may be manipulated in a way that the net result may be unfavorable despite increased costs36. Clinicians must play an essential validation role to optimize the use of next-generation biomarkers, even if very few are trained appropriately to understand and evaluate their strengths and limitations.
For sure, the next generation of biomarkers based on big data heralds a new, exciting, yet controversial era for rheumatology.
Free online via JRheum Full Release option