Abstract
Objective. To evaluate the diagnostic accuracy of the 2010 American College of Rheumatology/European League Against Rheumatism (ACR/EULAR) and 1987 ACR criteria for rheumatoid arthritis (RA), and the respective role of the algorithm and scoring of the ACR/EULAR.
Methods. In total, 270 patients with recent-onset arthritis of < 1 year duration were included prospectively between 1995 and 1997 and followed for 2 years. RA was defined as the combination, at completion of followup, of RA diagnosed by an office-based rheumatologist and treatment with a disease-modifying antirheumatic drug or glucocorticoid. We compared the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of the criteria sets in the overall population, in the subgroup meeting the tree condition for ACR/EULAR scoring, and in the overall population classified according the full tree.
Results. At baseline, 111 of the 270 patients had better alternative diagnoses and 16 had erosions typical for RA; of the 143 remaining patients, 52 had more than 6 ACR/EULAR 2010 points (indicating definite RA) and 91 had fewer than 6 points. After 2 years, 11/16 patients with erosions and 40/52 with more than 6 points had RA. 100 of the 270 patients met the reference standard for RA. Sensitivity, specificity, PPV, and NPV of the ACR/EULAR (full tree) were 51/100 (51%), 153/170 (90%), 51/68 (75.4%), and 153/202 (75.7%), respectively. Diagnostic accuracies of the ACR/EULAR score and ACR 1987 criteria were not statistically different.
Conclusion. Much of the improvement of the ACR/EULAR criteria was ascribable to the use of exclusion criteria in the algorithm.
In 1958, the American Rheumatism Association developed a set of classification criteria for rheumatoid arthritis (RA)1, which were revised by the American College of Rheumatism (ACR) in 19872. These criteria were designed to distinguish established RA from other joint diseases. They were widely used as the reference standard for diagnosing RA in clinical practice3,4. However, although the ACR criteria are easy to use, they are not ideal for diagnosing RA, for several reasons5: they were designed for classification, not diagnosis; most of the patients used to develop the criteria had long-standing RA; the predictive value of each criterion cannot be determined because the numbers of RA patients and controls were preestablished; and no exclusion criteria are used. Since the development of the 1987 ACR criteria, both nodules6 and radiographic findings7 at first evaluation have been shown to be of limited diagnostic usefulness. In addition, anticyclic citrullinated peptide (CCP) antibodies (ACPA) have emerged as a valuable diagnostic marker for RA8. Changes to the 1987 ACR criteria that were made to take these new findings into account improved the sensitivity of the ACR 1987 criteria, most notably in patients with arthritis of less than 6 months’ duration9. However, further improvement was needed, given the importance of early treatment for improving outcomes in patients with RA. The early diagnosis of RA is often difficult, as none of the clinical or laboratory features is diagnostic.
To construct diagnostic criteria, a longitudinal study must be performed in patients with early-stage arthritis, to determine which combination of features at baseline best predicts a diagnosis of RA at completion of the study. Defining the endpoint, i.e., diagnosis of RA at study completion, is challenging. Rheumatologists often differ on whether a given patient does or does not have RA10. One means of circumventing this difficulty may be to use an outcome other than a diagnosis of RA, for instance, persistent nonerosive or erosive arthritis11,12. However, RA can be defined unequivocally after a 2 year followup as a physician-made diagnosis of polyarthritis, no other diagnosis capable of explaining the symptoms, and treatment with disease-modifying antirheumatic drugs (DMARD). This definition was used by the ACR/EULAR to develop a new scoring system for RA13,14. This new system has a tree format: presence of synovitis is required (condition 1), followed by absence of a better alternative diagnosis (condition 2), and then by absence of erosions typical for RA (condition 3). Only patients meeting all 3 conditions are eligible for scoring. A score of at least 6 of 10 possible points from scores in 4 domains indicates RA.
We established a cohort of 270 patients from Brittany, France, with arthritis of less than 1 year duration, included between 1995 and 1997 and followed for 2 years. These patients did not contribute to the development of the ACR/EULAR criteria. Therefore, the cohort provides an opportunity to evaluate the diagnostic accuracy of the ACR/EULAR criteria. The main problem raised by such an evaluation is the choice of the reference standard. The opinion of rheumatologists regarding a diagnosis of RA is informed by the criteria that are tested. Consequently, there is circularity in the reasoning. For development of the ACR/EULAR criteria, RA was considered in patients taking DMARD (methotrexate) therapy and confirmed by expert opinion. In our study, the reference standard was a combination of having a physician-made diagnosis of RA, no other diagnosis of joint disease, and treatment with DMARD and/or glucocorticoid after 2 years’ followup.
The aim of our study was to evaluate the diagnostic accuracy of the ACR/EULAR criteria comparatively with the ACR 1987 criteria, using this reference standard, in our cohort of patients with early arthritis. In addition, we sought to determine the combination of baseline features that best predicted RA after 2 years.
MATERIALS AND METHODS
Study population
This prospective observational cohort was composed of all patients seen with early arthritis from 1995 to 1997 in 7 hospitals in Brittany, France. All patients in the cohort had synovitis in at least one joint (condition 1 in the ACR/EULAR tree).
All the patients were referred to us by general practitioners and rheumatologists who had been informed of the study. Inclusion criteria were as follows: age 18 years or older, synovitis in at least one joint, absence of a previous diagnosis of joint disease, and disease duration no more than 1 year. Patients were excluded if the medical history and the physical examination suggested septic arthritis or crystal-induced arthritis. Synovitis was diagnosed clinically by a rheumatologist based on the presence of joint swelling with tenderness or decreased range of motion.
The appropriate ethics committee approved the study, and all patients gave their written informed consent before inclusion in the study.
Baseline assessment
As described6, all patients had a standardized interview, a general physical examination, and laboratory tests during which over 100 variables were measured.
Study variables
All items of the unmodified and modified ACR 1987 criteria sets [criteria: 1. morning stiffness ≥ 1 hour, 2. arthritis of ≥ 3 joint areas, 3. arthritis of hand joints, 4. symmetric arthritis, 5. rheumatoid nodules, 6. rheumatoid factor (positive), 7. radiographic changes, and 8. anti-CCP antibodies (positive)]; and the ACR/EULAR criteria set were evaluated in each patient. Unmodified ACR 1987 criteria were considered positive in patients with at least 4 of criteria 1 through 72, and modified ACR 1987 criteria were considered positive in patients with at least 4 of the 8 criteria (Liao 1) or at least 3 of the 6 criteria left after eliminating criteria 5 and 7 (Liao 2)9.
For the ACR/EULAR 2010 criteria, we evaluated the presence of synovitis (condition 1), absence of a better alternative diagnosis (condition 2), and then absence of erosions typical for RA (condition 3). Only patients meeting all 3 conditions were eligible for scoring.
Patients were deemed to have RA according to the ACR/EULAR 2010 criteria14 if they had erosions typical for RA or a score of at least 6 of 10 possible points from scores in 4 domains as follows:
-
Joint involvement (1 medium-large joint: 0 point; 2–10 medium-large joints: 1 point; 1–3 small joints: 2 points; 4–10 small joints: 3 points; 10 joints with at least one small joint: 5 points);
-
Serology [no rheumatoid factor (RF) or ACPA: 0 point; low-positive (< 3 times the upper limit of normal [ULN] for the laboratory and assay) RF and/or ACPA: 1 point; high-positive (> 3 times ULN for the laboratory and assay) RF and/or ACPA: 3 points;
-
Duration of synovitis (< 6 weeks: 0 point; ≥ 6 weeks: 1 point);
-
and Acute-phase reactants [C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR) normal: 0 point; CRP and/or ESR elevated: 1 point].
Radiographic evaluation
Baseline hand radiographs (n = 258) were read by one author (VD), who was blinded to information about the patients. A standardized evaluation procedure was used to record typical erosions of RA, as described15. The intraobserver variability was determined by evaluation of the same radiographs twice, 3 months apart; the intraobserver kappa coefficient was 0.88.
Followup
A rheumatologist followed each patient at 6-month intervals until a clinical diagnosis of a specific joint disease was made and the patient met published criteria for the same joint disease. The patients were asked to attend a final visit between June and November 1999.
Outcome after 2 years
The diagnoses made in each patient after 2 years by the hospital-based rheumatologists were recorded.
We had previously evaluated the concordance between the office-based rheumatologists and the diagnosis of a panel of 5 rheumatologists6, and we observed a good concordance (kappa 0.81, 95% CI 0.77–0.85). On this basis, we chose the diagnosis of the office-based rheumatologist; but, as it is not important to predict RA benign enough not to be treated, in this study we added a treatment with DMARD or glucocorticoid 2 years later as the reference. A total of 16 patients considered to have RA by the rheumatologist did not receive either DMARD or glucocorticoid therapy (the number of “diagnosis of RA and DMARD or glucocorticoid therapy” was 100 in our study, instead of 116 using only “the diagnosis of RA”). In contrast, 50 patients considered non-RA received DMARD or glucocorticoid therapy. Thus, the concordance between “diagnosis of RA” and “DMARD or glucocorticoid therapy” was fair (kappa = 0.52).
Thus, for evaluation of diagnostic accuracy of the ACR/EULAR and ACR 1987 criteria sets, the definition of RA (reference standard) was a diagnosis of RA by the office-based rheumatologist after 2 years combined with treatment with DMARD or glucocorticoid. There was no protocol for deciding to start or stop DMARD therapy.
Statistical analysis
First, in the subgroups that did and did not meet the reference standard definition of RA after 2 years, we evaluated the proportion of patients having each item of the unmodified ACR 1987 criteria set2, the modified ACR 1987 criteria set9, and the ACR/EULAR criteria set.
The only subgroup adequate for evaluating the criteria sets was the population of patients meeting all 3 ACR/EULAR scoring criteria (synovitis, no radiographic evidence of RA, and no other diagnosis more likely than RA). We therefore compared the criteria sets in this sample. To determine which item or items best separated patients with and and those without RA in this sample and then to check whether these items matched those considered by the experts in the ACR/EULAR criteria set, we performed multiple logistic regression with backward selection using the likelihood ratio test.
As the ACR 1987 criteria are used frequently in cohorts of patients with early arthritis to separate patients with and without RA after some time, irrespective of whether the patients have radiographic evidence of RA and/or another more plausible diagnosis, we also compared the diagnostic accuracy of the ACR/EULAR scoring system with the diagnostic value of the ACR 1987 criteria used without patient selection based on conditions 1–3 in the overall population.
Statistical tests were performed using the Statistical Package for the Social Sciences (SPSS 13.0, 2005; SPSS Inc., Chicago, IL, USA) except for comparisons of areas under the curve (AUC), which were done using SAS 9.0 (SAS Institute, Cary, NC, USA). Quantitative variables are described as means ± standard deviation and qualitative variables as number (percentage). To compare the distributions of laboratory test results between patients with and without RA after 2 years, we used the chi-square test (or Fisher exact test, if appropriate) for qualitative variables and the Mann-Whitney test for quantitative variables. For each criterion, sensitivity was plotted against 1 — specificity to obtain the receiver operating characteristic (ROC) curve16. A ROC curve was plotted for each criteria set by varying the cutoffs (values lower than the cutoff considered negative and other values positive), and AUC were compared. For each criteria set used at baseline, we computed sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), with 95% confidence intervals, for predicting RA after 2 years.
RESULTS
Study population
The study cohort comprised 270 patients with arthritis of < 1 year duration. Median followup was 30 months [< 1 year in 16 cases (6%), 1 to 2 years in 21 (8%), and more than 2 years in 233 cases (86%)]. After 2 years, 100/270 (37%) patients met our reference standard definition of RA. Table 1 shows the criteria present at baseline in patients with and without RA after 2 years.
Prevalence of each item of the unmodified ACR 1987 criteria, modified ACR 1987, and ACR/EULAR criteria in patients with and without a final diagnosis of RA after 2 years, defined as RA diagnosed by the office-based rheumatologist and receiving disease-modifying antirheumatic drug and/or glucocorticoid therapy.
Diagnostic value of the RA classification criteria sets
At baseline, 111 of the 270 patients had alternative diagnoses that better explained the arthritis. Sixteen patients had erosions typical for RA and 11 of them had RA at 2 years. Among the patients having erosions at inclusion but no diagnosis of RA at completion, one had algodystrophy, 3 had RA but were taking no DMARD, and 1 was considered by the rheumatologist to have undifferentiated arthritis.
Of the 143 remaining patients, 91 had scores lower than 6, including 38 (42.2%) with RA after 2 years; and 52 had scores ≥ 6 (indicating definite RA), including 40 (75.6%) with RA after 2 years (Figure 1). Among no RA patients having a score ≥ 6, diagnoses were as follows: chondrocalcinosis (n = 1), mixed connective tissue disease (1), hydroxyapatitis (1), polymyositis (1), polymyalgia rheumatica (1), spondyloarthropathy (2), lymphoma (1), and 4 with RA but taking no DMARD.
Final diagnosis of rheumatoid arthritis (RA) in the overall population according to the ACR/EULAR criteria.
The sensitivity, specificity, PPV, and NPV of the ACR/EULAR (full tree) were 51/100 (51%), 153/170 (90%), 51/68 (75.4%), and 153/202 (75.7%), respectively (Table 2). For all criteria, specificity was better in the subgroup meeting the ACR/EULAR scoring conditions 1–3 (use of the full tree) than in the overall population (use of the scoring criteria regardless of whether conditions were met).
Diagnostic value of the 3 criteria sets (unmodified ACR 1987, modified ACR 1987 criteria, and more than 6 points on the ACR/EULAR score), and of the best combination of items in the Brittany cohort in the overall population, in the subgroup meeting the 3 conditions for ACR/EULAR scoring (synovitis, no better alternative diagnosis, and no radiographic erosion typical for RA), and in the overall population classified according to the full ACR/EULAR tree.
The ACR/EULAR criteria performed slightly better than the unmodified or modified ACR 1987 criteria in the subgroup meeting ACR/EULAR scoring conditions (Figure 2A), whereas no significant difference was found in the overall population (Figure 2B). However, we failed to demonstrate any statistical differences between the AUC of the ACR/EULAR scoring system versus the ACR 1987 criteria (unmodified or modified by Liao 1 and 2) in the overall population (p = 0.29, p = 0.61, and p = 0.70, respectively) or in the subgroup meeting the 3 ACR/EULAR conditions (p = 0.49, p = 0.89, and p = 0.6, respectively).
Receiver operating characteristic curve of the criteria sets in the subgroup meeting the 3 conditions for ACR/EULAR scoring (synovitis, no better alternative diagnosis, and no typical erosions; panel A; n = 143) and in the overall population (panel B; n = 270).
Best combination of items
Logistic regression analysis was done to identify the best combinations of items for diagnosing RA in the subgroup meeting all 3 conditions for ACR/EULAR scoring. Analysis identified the following variables: symmetric arthritis (ACR criterion 4), rheumatoid factor (positive or negative), ACPA (negative, low, high), ESR or CRP, and joints (0 to 5 points), respectively (Table 3). Diagnostic performance was best when all ACR/EULAR items were combined with the “symmetric arthritis” item (item 4) of the ACR 1987 criteria, as follows: [(joint 0–5)/5 + (ACPA 0–3) + (RF 0–1) x 2 + (ACR4 0–1) x 2 + (ESR or CRP 0–1) x 2], where joint 0–5 indicates the ACR/EULAR joint score, ACPA 0–3 (negative 0, low 2, high 3), RF 0–1 the absence or presence of RF, ACR4 0–1 the absence or presence of item 4 in the ACR 1987 criteria set, ESR 0–1 the absence or presence of ESR elevation, and CRP 0–1 the absence or presence of CRP elevation. This equation is designated “the best combination of criteria” hereafter. A score of 5 or more had the best diagnostic value. Figure 3A shows the diagnostic value of each item in the subgroup meeting all 3 conditions for ACR/EULAR scoring. Figure 3B and Table 2 show that the Brittany criteria performed slightly better than other criteria sets in this subgroup (p = 0.006 vs ACR criteria; p = 0.02 vs Liao 1 and p = 0.04 vs Liao 2 modified criteria; and p = 0.03 vs the ACR/EULAR criteria).
Receiver operating characteristic curve of each item identified by the logistic regression in the subgroup meeting the 3 conditions for ACR/EULAR scoring (synovitis, no better alternative diagnosis, and no typical erosions; panel A); and in panel B, the diagnostic value of the best combination of items in the Brittany cohort [(joint 0–5)/5 + (ACPA 0–3) + (RF 0–1) × 2 + (ACR4 0–1) × 2 + (ESR or CRP 0–1) × 2], where joint 0–5 indicates the ACR/EULAR joint score, ACPA 0–3 (negative, low, high), RF absence or presence of rheumatoid factors, ACR4 0–1 the absence or presence of item 4 in the ACR 1987 criteria set, ESR 0–1 the absence or presence of ESR elevation, and CRP 0–1 the absence or presence of CRP elevation, compared to the other criteria sets.
Logistic regression in the group that was not better explained by another diagnosis and without erosion typical for RA.
DISCUSSION
We report the first data on the diagnostic accuracy of the ACR/EULAR criteria for RA in a prospective cohort of patients with recent-onset arthritis that was not among the cohorts used to develop the criteria.
Application of the ACR/EULAR criteria is appropriate in the population of patients with synovitis, no radiographic evidence of RA, and no other more plausible diagnosis. We therefore evaluated the ACR/EULAR criteria in this population. The ACR 1987 criteria have been used both in similar patients (for example, by clinicians to confirm a doubtful diagnosis) and in unselected patients with early arthritis (to separate RA from other conditions). Consequently, we also evaluated the scoring component of the ACR/EULAR tree in our overall population, compared to the ACR 1987 criteria (the original version and modifications by Liao).
In our cohort, we found no statistically significant differences in diagnostic accuracy between the ACR/EULAR 2010 score and the ACR 1987 criteria (unmodified and modified by Liao, et al) in the overall population or in the subgroup meeting the 3 conditions for ACR/EULAR scoring. However, specificity was clearly improved by the tree format, which excluded patients having a better alternative diagnosis at baseline. Of the 270 patients, 111 (41%) had a better alternative diagnosis at baseline. Of these 111 patients, 11 were finally diagnosed as having RA. The most common diagnoses in these patients were spondyloarthropathy (21%), crystal-induced arthritis (5%), and connective tissue disease (7%; including Sjögren’s syndrome, systemic lupus erythematosus, scleroderma, and others). These results are consistent with those in the 936 patients of the Leiden cohort17.
Patients were included in our cohort if they had synovitis in one or more joints of less than 1 year duration. They were followed prospectively for 2 years. This is the ideal situation for testing new diagnostic criteria. RA was defined according to a practical viewpoint as RA diagnosed by the office-based rheumatologist combined with use of DMARD and/or steroid therapy.
The third condition for ACR/EULAR scoring is absence of erosions typical for RA on plain radiographs. Such erosions are rare in patients with recent-onset arthritis. Of the 159 patients with no better diagnosis than RA at baseline, 16 had typical erosions and were therefore classified by the ACR/EULAR criteria as having RA. However, only 11 of these patients were finally diagnosed as having RA, and thus 5 patients would have received a DMARD for a condition other than RA. In an earlier study of the same cohort, we found that presence of typical erosions on the baseline radiographs was 22% sensitive and 98% specific for RA6. However, radiographs of the feet were not performed and would be expected to increase the number of patients with erosions6. The value of radiographic erosions for the diagnosis of RA has been evaluated in 2 other prospective cohorts of patients with undifferentiated arthritis12,18. Sensitivity was low (16%12 and 24%17) and specificity was high (93%12 and 92%18).
To compare the performance of the ACR/EULAR scoring system to other criteria sets, we used the unmodified ACR 1987 criteria and the ACR 1987 criteria modified by Liao in the patient subgroups that were defined based on whether the second and third conditions for ACR/EULAR scoring were met (all patients met the first condition). In the subgroup of patients meeting both the second and third conditions, the ACR/EULAR scoring system performed slightly better than did the other criteria sets (Table 2). To determine the best combination of items in patients meeting the second condition (no better alternative diagnosis), we performed a logistic regression analysis. We found that all items in the ACR/EULAR score were associated with a final diagnosis of RA except the duration of synovitis. This item proved difficult to study in the first phase of ACR/EULAR criteria development, and the 6-week cutoff was chosen arbitrarily to encourage early referral and diagnosis. In our study, the RF titer did not provide diagnostic information in addition to that supplied by the presence or absence of rheumatoid factors. In a cohort of patients with early arthritis from The Netherlands, baseline IgM RF levels showed only a weak correlation with structural disease progression and had no association at all with disease activity after 2 years19. In our study, presence of RF and the ACPA titer were independent factors predicting a final diagnosis of RA after 2 years, suggesting that both should be assayed when RA is suspected, and separated in the scoring system. No consensus exists concerning the methods to be used for autoantibody detection or the definitions of negative, low-positive, and high-positive values. Therefore, arbitrary cutoffs were used in the ACR/EULAR criteria. Local laboratory ranges should be used until international units are developed, and a high titer should be defined as a titer > 3 times the upper limit of the normal range.
In our study, acute-phase reactants (ESR or CRP) were strong predictors of RA in the subgroup meeting the second condition for ACR/EULAR scoring although they were not associated with a final diagnosis of RA in the overall population. Similarly, in earlier studies acute-phase reactants were associated with RA in cohorts of patients with undifferentiated arthritis12,20, but not in cohorts established without excluding patients who had better alternative diagnoses21,22.
In the ACR/EULAR criteria, joint involvement is defined as tender and/or swollen joints at the time of assessment by an expert. Small joints are defined as the proximal interphalangeal joints, metacarpophalangeal joints, wrists excluding the first carpometacarpal joints, and metatarsophalangeal joints of toes 2 through 5. The distal interphalangeal joints are not considered. Large joints are defined as the elbows, shoulders, hips, knees, and ankles. In our regression analysis, the pattern of joint involvement predicted RA, but to a lesser degree than the other items. The ACR/EULAR system for scoring synovitis between 0 and 5 could probably be simplified for easier use in daily practice. In our study, the fourth item of the ACR 1987 classification criteria, namely symmetric joint involvement, improved the diagnostic performance when used in combination with the ACR/EULAR criteria.
Thus, the best scoring system involved slight modifications to the ACR/EULAR criteria, consisting of removal of synovitis duration, addition of symmetric joint involvement, and handling of the RF criterion as a binary variable (present/absent). The calculated weights favored items that strongly predicted RA. We developed a scoring system on a 0–9 scale, with 2 points assigned to symmetric joint involvement, presence of rheumatoid factors, low ACPA titer, and elevated ESR or CRP; 3 points assigned to high anti-CCP titer; and the joint involvement pattern between 0 and 5 divided by 5. A score greater than 4 was the optimal cutoff to discriminate RA from non-RA patients. Our goal was not to develop new criteria but to determine whether logistic regression performed in a cohort with early arthritis identified an effective scoring system using items of both the ACR 1987 criteria set and the ACR/EULAR 2010 criteria set. We identified such a system, whereas we found no significant difference in diagnostic accuracy between the ACR 1987 criteria (modified or unmodified) and the ACR/EULAR scoring after patient selection by the tree format. Thus, these 2 criteria sets may be similar, and a combination of items from both sets may slightly improve diagnostic performance.
In summary, much of the improvement supplied by using the ACR/EULAR criteria for RA is ascribable to the exclusion of patients with alternative diagnoses.
Acknowledgment
We are grateful to the following rheumatologists for referring patients: E. Blat, P. Busson, A. Castagné, J.P. Caumon, P. Chicault, V. Desmas, J.P. Elie, X. Filliol, C. Gauthier, J. Glemarec, J.Y. Grolleau, M.N. Guillermit, R. Guyader, M. Hamidou, P. Herrou, G. Lavel, F. Le Jean, R. Lemaître, M.C. Lheveder, A. Martin, Y. Maugars, I. Nouy Trolle, J. Olivry, C. Paturel, N. Paugam, A. Prost, D. Rault, B. Ribeyrol, A. Rossard, D. Rodet, I. Valls, P. Vilon, and P. Voisin.
Footnotes
-
Supported by the Brest Hospital Center and the 1995 Clinical Research Hospital Program (PHRC 1995).
- Accepted for publication February 11, 2011.