In the treatment of rheumatoid arthritis (RA) it is of growing importance to measure disease activity both in clinical practice as well as in research. The last 20 years have brought us several well-validated disease activity indices: for example, the Disease Activity Score (DAS), DAS28 for 28 joints, the Clinical Disease Activity Index, and the Simplified Disease Activity Index are currently being used, and validated cutoff points to determine disease activity states as well as change criteria to indicate improvement to therapy have been developed1,2,3,4,5,6. However, in addition to measuring absolute disease activity states and improvement, there is an increasing need for assessing RA flare or worsening. Therefore, at the OMERACT 9 (Outcomes in Rheumatology) meeting a working definition of RA flare was proposed: flare occurs with any worsening of (or return of) disease activity that would, if persistent, lead to (re)initiation, increase or/and change of therapy; a flare represents a cluster of symptoms of sufficient duration intensity to require (re)initiation, change, or increase in therapy1. Although this working definition was an essential first step, research is needed on validated flare criteria, and the work of Bykerk, et al in this issue of The Journal represents an important contribution in the field7. Here, we would like to discuss several aspects of development and use of RA flare criteria.
First, why do we need thoroughly validated flare criteria? The first scenario that exemplifies the need for a flare criterion is the use of fire-and-forget type of treatments such as rituximab, in which the timing of retreatment is often based on occurrence of a worsening in disease activity. Flare criteria are also essential in down-titration and discontinuation studies as well as tapering of medication in daily clinical practice to determine clinically relevant worsening to guide reinstating therapy or increasing dose. Finally, in comparing (biologic) disease-modifying antirheumatic drugs (bDMARD) it could be of interest to see which treatment has the lowest risk of occurrence of in-between flares, because primary outcome measures such as percentage of low disease activity or percentage of American College of Rheumatology (ACR) improvement at study end appear comparable for different drugs in patients with baseline high-disease activity, but stability of the disease activity may not be comparable8,9,10. With reaching remission as a goal and the knowledge that periodic worsening is associated with radiographic damage, the frequency, number, and severity of flares might be an interesting additional variable in the near future to compare the efficacy of bDMARD11.
There are, however, some issues to be clarified when considering heterogeneity of flare criteria used in clinical research. There is indeed a considerable variation noticeable in flare criteria that have been used in clinical studies or have been proposed in literature1,12. These criteria vary from an increase in number of swollen joints to physician’s decision to change treatment (which would be an interesting circularity of course when used in clinical practice), a worsening of components of the ACR response criteria or worsening based on DAS28, to patient-reported flares1,12,13. Indeed, recently it was shown by Yoshida, et al that almost all biologic discontinuation studies have used a different criterion to decide on treatment resumption or to determine disease worsening12. This wide variety of different criteria is undesirable because data on flares may be difficult or impossible to compare.
How to resolve this issue of heterogeneity? Looking at the variety of criteria, there seem to be 2 main approaches: the patient-reported flare-based criteria, and the joint-score and laboratory-test based criteria. Both approaches have pros and cons that mainly concern the content of the domains measured by the criterion, and second, the need or not for face-to-face contact with the patient.
When, for example, disease activity measurement (e.g., DAS28 measurement) is incorporated in routine clinical care, a flare criterion based on this measure is easily calculated to guide patient and physician. This however requires that patients have low threshold access to healthcare once they experience a flare, which means that travel distance and admission times should be acceptable for the outpatient clinic visit. Also, questions on self-management will not be included in joint-score-based criteria, although this domain has proved to be very interesting in the OMERACT 10 Delphi procedure14.
Patient-reported flares, on the other hand, could easily be used at home to guide a patient in contacting their physician to discuss over the phone what treatment changes are necessary, or maybe even to execute a predetermined plan with their physician to change treatment. However, because no input from the physician nor objective disease activity indices (e.g., acute-phase reactants) are incorporated, there is risk of underreporting or overreporting flares in patients; moreover, because of response shift, patients’ judgment has been shown to be impaired regarding the level of disease activity, as well as longterm changes in disease activity (although for short-term changes such as disease flare this problem should be smaller). Therefore the ideal criterion is probably a combination of both patient-reported joint score and laboratory testing-related items as demonstrated by the validation of the different flare domains used in the OMERACT preliminary flare criteria13. Whether this is feasible in daily clinical practice remains a question and can heavily depend on local contexts of healthcare.
A complicating factor in validating flare criteria and resolving this heterogeneity issue is the lack of gold standard for flare. Looking at the validation of patient-reported flare criteria and the joint score and laboratory testing-based flare criteria, researchers used either patients’ report on worsening, or a worsening in joint score and vice versa, thus mutually anchoring their flare criterion. For example, where Bykerk, et al (published in this journal) demonstrate a relation between patients reporting a flare and the DAS28, we in turn reported a relation between DAS28-based flare criteria and patients reporting disease worsening15. This well-known, back-and-forth stepping stone technique remains the solution for validation studies when no gold standard is available. However, a more external standard is necessary to resolve the question of which of the approaches is favorable in which situation. Interesting alternatives for a gold standard could be using radiographic outcome as an anchor, although it reflects a late consequence of flare rather than the concept of flare itself. Another anchor could be “functioning,” which also demonstrated to be strongly correlated to flare; however, function is also a patient-reported outcome (e.g., modified Health Assessment Questionnaire). Other more novel techniques, including positron emission tomography and biomarkers could be used, but those have the disadvantage that they represent a more technical pathophysiological representation of flare that is further from the patient experience of flare. So, the ideal gold standard for flare to use in validation studies has yet to be found.
A final issue with regard to validation and use of flare criteria is that the concept of flare might be a moving target. As treat-to-target strategies have demonstrated that aiming for low disease activity and remission has become an accessible goal, a (threshold) shift could occur in what patients and physicians see as a flare. Interestingly, this is exemplified by the OMERACT working definition of flare, as it includes the phrase “any worsening of (or return of) disease activity that would, if persistent, lead to (re)initiation, increase or/and change of therapy.” Recent decades have certainly taught us as clinical rheumatologists that disease activity — once considered acceptable — should now be viewed as uncontrolled disease and be treated as such. This effect of “moving goalposts” has also been inferred from the data from Bykerk’s study. Although Bykerk, et al demonstrated that flares were reported more by patients in moderate to severe disease activity than by patients in remission, flares still seem to occur in patients with RA in remission as shown by Hewlett, et al16. Because both studies asked the patient “whether they were experiencing a flare or not,” these flares might be fundamentally different because of the different baseline level of disease activity. On a critical note, instead of debating the best flare criterion, we should perhaps first focus more on optimally treating to target in our patients17. Although the benefits of treat-to-target have been demonstrated, many patients still do not receive this level of care, as witnessed by the relatively high mean DAS28 in several large RA registries and cohort studies, including the BRASS registry, as Bykerk, et al also mention in their discussion7.
It should be appreciated that Bykerk, et al and the OMERACT RA flare group are studying flare thoroughly; and we share the desire to come up with valid flare criteria that can easily be used both in research as well as in daily clinical practice, because that will improve care for our patients and research alike.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.