Abstract
Objective. Given the complexity and heterogeneity of systemic lupus erythematosus (SLE), high-performing classification criteria are critical to advancing research and clinical care. A collaborative effort by the European League Against Rheumatism and the American College of Rheumatology was undertaken to generate candidate criteria, and then to reduce them to a smaller set. The objective of the current study was to select a set of criteria that maximizes the likelihood of accurate classification of SLE, particularly early disease.
Methods. An independent panel of international SLE experts and the SLE classification criteria steering committee (conducting SLE research in Canada, Mexico, United States, Austria, Germany, Greece, France, Italy, and Spain) ranked 43 candidate criteria. A consensus meeting using nominal group technique (NGT) was conducted to reduce the list of criteria for consideration.
Results. The expert panel NGT exercise reduced the candidate criteria for SLE classification from 43 to 21. The panel distinguished potential “entry criteria,” which would be required for classification, from potential “additive criteria.” Potential entry criteria were antinuclear antibody (ANA) ≥ 1:80 (HEp-2 immunofluorescence), and low C3 and/or low C4. The use of low complement as an entry criterion was considered potentially useful in cases with negative ANA. Potential additive criteria included lupus nephritis by renal biopsy, autoantibodies, cytopenias, acute and chronic cutaneous lupus, alopecia, arthritis, serositis, oral mucosal lesions, central nervous system manifestations, and fever.
Conclusion. The NGT exercise resulted in 21 candidate SLE classification criteria. The next phases of SLE classification criteria development will require refinement of criteria definitions, evaluation of the ability to cluster criteria into domains, and evaluation of weighting of criteria.
Systemic lupus erythematosus (SLE) is a complex, systemic autoimmune disease characterized by heterogeneity in disease manifestations and prognosis. Classification criteria are used to identify more homogeneous groups of patients for inclusion in clinical trials and observational studies1. With the support of the European League Against Rheumatism (EULAR) and the American College of Rheumatology (ACR), development of new classification criteria for SLE is currently under way2. A secondary goal of this initiative is to classify individuals with SLE earlier in this disease course. In line with ACR and EULAR standards, the SLE classification criteria development process was designed to consist of 4 phases, with balanced use of expert-based and data-driven methods meeting the standards set by the ACR and EULAR1,3,4,5.
In Phase 1 of criteria development, positive antinuclear antibodies (ANA) were evaluated as a potential entry criterion for SLE classification6. Through systematic review and metaregression of the literature, a minimum titer of 1:80 on the indirect immunofluorescence HEp-2 ANA test resulted in 97.8% sensitivity and acceptable specificity for SLE. This suggested that ANA at this titer may constitute a reasonable entry criterion for SLE classification, provided that patients who were historically positive would be counted as positive. However, given that a positive ANA at this titer has only a limited specificity, classification of SLE requires further disease characteristics to achieve a high degree of specificity6.
Aimed at maximizing the range of potential disease-specific criteria, Phase 1 also comprised 3 independent studies to generate a list of candidate items. First, a large international Delphi exercise of SLE experts nominated 145 candidate criteria7. The experts rated the criteria on a 1–9 scale for their acceptability for the classification of SLE. Items with a mean acceptability score of 6.5 were retained if at least 50% of participants rated the item acceptability at ≥ 7. None of the individual neuropsychiatric SLE (NPSLE) items made the inclusion thresholds of the expert Delphi exercise. However, based on comments during the Delphi exercise and a motion by patient representatives from Lupus Europe, the steering committee reached consensus that NPSLE was an important organ manifestation that needed further consideration. A provisional composite central nervous system (CNS) dysfunction criterion was formed. Using this expert-based approach, 40 items were retained for further consideration.
Second, a data-driven exercise evaluated features of patients in the first few years of their disease (early) and compared those who subsequently were diagnosed with SLE to those who were diagnosed with a mimicking disease8,9. A multicenter “early disease” cohort was established with data from Europe and North America. The results led to the addition of 3 criteria occurring with increased frequency in early SLE: arthralgias, fever, and fatigue. Third, as per EULAR recommendations, the patient perspective was specifically addressed in a cross-sectional survey of 339 German patients with SLE, focusing on manifestations experienced early in their disease10. Again, fatigue (89%), fever (54%), and arthralgias (87%) were supported as criteria for consideration in early disease.
Thus, a total of 43 candidate criteria were proposed for consideration in the next phase of criteria development. The 43 criteria needed to be reduced to a more manageable number and further refined. The primary objectives of our study, comprising Phase 2 of the SLE classification criteria development process, were to reduce the number of candidate criteria and to identify criteria that should be retained for the next phase, with the aim of selecting a set of items that maximizes the likelihood of accurate classification of SLE, particularly early disease. The results of our study informed Phase 3, where the relative contribution of each criterion to the classification of SLE and threshold for classification of SLE were assessed. In Phase 4, the draft criteria set will be refined in a derivation cohort, and then comparatively evaluated against previous criteria sets in a validation cohort.
MATERIALS AND METHODS
Candidate criteria
The 43 candidate criteria nominated from the Phase 1 studies were the following: ANA on HEp-2 cells with a pattern compatible with SLE, titer > 1:160; ANA-positive (any pattern) > 1:160; low C3 and C4; ANA-positive by HEp-2; low C3; lupus nephritis by renal biopsy with immune deposits; anti-dsDNA antibody; anti-Sm antibody glomerulonephritis [dysmorphic urinary red blood cell (RBC) or urinary RBC casts (≥ 1 cast/high power field)]; acute, subacute, or chronic SLE rash (can include malar, discoid, subacute cutaneous lupus erythematosus); rash with dermoepidermal interface changes and immunoglobulin and/or complement deposition on immunofluorescence; persistent proteinuria (> 0.5 g/day); malar rash; active urine sediment (without urinary tract infection); serositis; arthritis; presence of multiple autoantibodies; CNS dysfunction (seizures, psychosis, chorea, myelitis, optic neuritis, stroke or acute confusional state); oral mucosal lesions on the hard palate; thrombocytopenia; leukopenia (< 4000/mm3 on 2 or more occasions); antiphospholipid antibodies (lupus anticoagulant, anticardiolipin, anti-β2–glycoprotein 1 antibody, or prolonged Russell’s viper venom time); thrombocytopenia (severe); autoimmune hemolytic anemia; photosensitive rash; antiphospholipid antibody syndrome (clinical signs/history and antibodies); urine cellular casts; discoid rash; lymphopenia (< 1500/mm3 on 2 or more occasions); positive lupus anticoagulant; pleural effusion; pleuritis; subacute cutaneous lupus erythematosus; alopecia with associated scalp inflammation; pericardial effusion; photosensitivity; Raynaud phenomenon; fever; lupus profundus; lymphopenia (< 1000/mm3 on 2 or more occasions); arthralgia; and fatigue (Table 1).
Consensus method
Nominal group technique (NGT) is a structured consensus method for group decision making that facilitates contribution from all the participants rather than an individual expert, in a formalized manner11,12. This methodology allows for the incorporation of a spectrum of experience and knowledge. It stimulates constructive debate while reducing the potential bias of an influential opinion, and is best suited for topics where there is insufficient evidence12,13. This approach has been successfully applied in the development of other rheumatologic classification criteria (systemic sclerosis) and outcome measures14,15,16. The NGT includes assembly of an expert panel, premeeting individual rankings, and a consensus meeting.
Expert panel
Internationally recognized SLE experts for the NGT panel were purposively sampled from the international SLE community, endorsed by the SLE classification criteria steering committee and consecutively invited. Inclusion criteria were recognized expertise in SLE based on research and patient care, and representation of Europe and North America. Dr. Dinesh Khanna (DK) served as independent moderator of the exercise.
Premeeting ranking
The 43 criteria, with their mean and median appropriateness scores, and the proportional endorsement were sent to the NGT expert panel. The experts were asked to review the criteria and rank them in order of importance (1 = most important). Their task was phrased as, “We are developing criteria for classification of SLE for clinical trials and other research studies. One aim is to increase the inclusion of patients with early SLE, who are less likely to have manifestations related to longterm SLE and organ damage. We are not developing diagnostic criteria. The primary objectives of this exercise are to identify criteria that should be retained for the next phase and to reduce the number of candidate criteria.
“In a person with an uncertain diagnosis, which criteria most increase the likelihood that the patient has SLE? In making this judgment, you should ask yourself: If there are two patients identical in every other respect, and one has this extra feature and one doesn’t, is the patient with the feature more likely to have SLE than the other? For example, if both patients have a history of deep vein thrombosis and one has had an upper-limb deep vein thrombosis, does this latter feature really increase the likelihood of the patient having SLE? If not, it is not helpful in classifying the patient as having SLE.”
The experts were then asked to submit their rankings and comments. The data were anonymized and median (range) rankings were calculated for each criterion.
Consensus meeting
The expert panel and steering committee met face-to-face in a room with rectangular tables arranged in an open U shape with a flip chart and large computer screen at the open end of the tables12. The data evaluating the performance characteristics of ANA testing for consideration as an entry criterion6, the Delphi exercise data7, the early SLE and mimicker disease cohort data8,9, and the premeeting rankings were presented. The NGT facilitator (DK) presented an overview of the NGT process12.
In a round-robin fashion, panelists were asked to comment upon the candidate criteria presented one at a time to the entire group. No interactive discussion was conducted at this time. After each panelist had an opportunity to speak, a serial brief discussion was led by the moderator with the goal of clarification of points made. Deliberations, including the steering committee, ensued until consensus was achieved on the inclusion, exclusion, or revision of each criterion. For each round of discussion, the process required that the first person to speak was different from that in the previous round. In this way, all panelists had the opportunity to speak first and avoid the effect of strong personalities12. The process ensured that all participants had an opportunity to contribute.
Institutional ethics approval (17-5926) and consent were obtained for the conduct of this study.
RESULTS
Expert panel
The expert panel and steering committee comprised 19 members (47% female, 53% male) conducting SLE research in Canada, Mexico, the United States, Austria, Germany, Greece, France, Italy, and Spain, for 43% European and 57% North America representation.
Premeeting rankings
The premeeting rankings for potential entry criteria (ANA and complements) and potential additive criteria are presented in Table 1. Lupus nephritis by renal biopsy with immune deposits, dsDNA, and anti-Sm antibodies were ranked highest.
Consensus meeting
The NGT exercise reduced the candidate criteria for SLE classification from 43 to 21. The panel distinguished potential “entry criteria,” which would be required for classification, from other potential “additive criteria,” summarized in Table 2.
Entry criteria
The panel agreed that ANA should be an entry criterion, and based on the Phase 1 systematic review and metaregression data6, have a threshold of ≥ 1:80 (by HEp-2 immunofluorescence). Accordingly, “ANA on HEp-2 cells with a pattern compatible with SLE,” ANA at a titer of > 1:160, ANA-positive by HEp-2, and low C3 were excluded. It was recognized that perhaps up to 2% of patients with SLE have a negative ANA at some time. Excluding all patients with negative ANA would exclude some of the population of patients with SLE. The use of low complement levels (and low C3 and/or low C4) as an entry criterion was considered potentially useful in cases with negative ANA. However, the inclusion of low complement levels was controversial. It was felt that complement was important but should not be an entry criterion. Main arguments against low complements as an entry criterion were that many patients would not have low complements in the early phase of disease if they did not already have renal involvement, and that low C4 was often genetically determined.
Additive criteria
The panelists achieved consensus on criteria that would be excluded. Arthralgias, fatigue, and Raynaud phenomenon were not considered sufficiently specific.
The panelists queried whether criteria could be clustered into “buckets” that are clinically or physiologically related. For example, urinary RBC casts and urine cellular casts were seen as redundant with proteinuria. Following similar arguments, pericardial effusion, pleuritis, and pleural effusion were clustered into 1 criterion of serositis.
For lymphopenia and thrombocytopenia, the panelists agreed to remove “severe,” and replace with thresholds as outlined in the Systemic Lupus International Collaborating Clinics (SLICC) criteria11.
The panelists also agreed with clustering the CNS manifestations into 1 domain. The panel recommended to use CNS manifestations instead of CNS dysfunction, given that CNS manifestations commonly reflect inflammatory activity in the CNS.
For skin manifestations, the expert panel suggested reduction to 2 criteria according to the SLICC definition, that is, acute or subacute cutaneous lupus and chronic cutaneous lupus17. Accordingly, malar rash, discoid rash, photosensitive rash, subacute cutaneous lupus erythematosus, photosensitivity, and lupus profundus were removed from the candidate list. Several experts pointed out that some signs are important, but may lose specificity if used by non-experts, such as malar rash wrongly diagnosed in patients with rosacea. The panel agreed that several criteria needed stringent definitions, particularly “presence of multiple autoantibodies,” “arthritis,” and “fever.”
Additional discussion points
The expert panel discussed the differing level of importance of some criteria. The panel discussed differential weights for each criterion to indicate its importance. However, concern was raised about a system that was too computationally difficult to use in clinic. It was preferred to use a system of weighting that had computational ease.
DISCUSSION
In this NGT exercise, part of the second phase of the SLE classification criteria project supported by EULAR/ACR, candidate criteria for the classification of SLE were refined. Starting with 43 candidate criteria for SLE classification, the exercise resulted in 21 criteria, a more manageable number for creating a system of classification. However, 3 important issues were raised. First, there was concern regarding the lack of precise definitions for the candidate criteria. This would result in inconsistent application of the criteria, affecting the validity and reliability of the final classification system. While it was not within the scope of the NGT panel to devise these definitions, this defined a further important step.
Second, it was important to understand the validity of each of the candidate criteria, notably their individual sensitivity and specificity. While the expert panel largely agreed on the approximate sensitivity and specificity of items, it appeared evident that more work on this aspect was needed.
Third, the expert panel raised the issue of interdependency of items, proposing that some criteria might cluster into “buckets.” This question had not been previously addressed in any SLE classification criteria set before. However, it became obvious in the discussion that this question will need to be addressed.
The next phase of work will therefore require identification of precise definitions for any criteria that have ambiguity. Potential solutions included development of an online, freely available reference guide with definitions and photographs. The group’s recommendation was to look for established and widely accepted definitions for criteria items such as the American Rheumatism Association Glossary Committee Dictionary of the Rheumatic Diseases18, other criteria sets, or other medical disciplines. In the absence of established definitions, the definitions for items should be explicitly stated in the new criteria system. The validity of each of the 21 retained criteria also needs to be evaluated, because a system of classification is only as strong as its weakest criterion1,19. The operating characteristics (sensitivity, specificity) of the items in both SLE cases and mimicking conditions are needed. Criteria with poor discrimination should be discarded19.
Throughout the NGT exercise, novel concepts for SLE classification emerged. First was the notion of clustering criteria into “buckets.” Historically, this has been done clinically using a body systems approach. However, the question arose: Is this methodologically appropriate? In a system of classification, items should be independent. The expert panel proposed the concept of hierarchical clustering of items into domains and subdomains. Prior to doing this, however, the relationship (correlation and interaction) of clinically related criteria will need to be evaluated to ascertain independence.
The expert panel expressed the general opinion that there are differences in the relative importance of some criteria over others for classification. Lupus nephritis by renal biopsy with immune deposits, and to a lesser degree, antibodies to dsDNA and Sm were ranked highly. Fever was considered a potential criterion to distinguish cases and controls in early disease but attributed with comparatively lower importance. The next phase should evaluate the relative weight of each criterion for classification while maintaining computational ease.
In the steering committee deliberations following the NGT exercise, there was some discussion on whether it was disappointing that the set of candidate items contains no surprises, maybe with the exception of fever. However, item generation has been as broad as possible, and the reduction in items has rigorously followed methods that have been scientifically established. Whether alternative unbiased genetic or mRNA approaches will lead to different insights remains to be seen20.
Finally, consideration will need to be given to the effect of ANA as an entry criterion. In the Phase 1 systematic review of the literature including 13,080 subjects diagnosed with SLE, 95.9% were ANA-positive by indirect immunofluorescence on HEp-2 cells6. In the Phase 1 early SLE cohort, 99.5% of the 389 SLE patients were ANA-positive8. The Phase 1 Delphi study of international SLE experts found 58% do not feel comfortable and an additional 19% were uncertain about classifying SLE in the absence of ever having a positive ANA7. Together, these data support the decision to use ANA as an entry criterion. However, research is needed to evaluate the numbers of patients with a diagnosis of SLE who are ANA-negative, particularly those with hypocomplementemia. Subsequent work may require considerations to appropriately classify this subset of patients with SLE.
The NGT exercise identified a core set of candidate criteria with the intended goal of maximizing the likelihood of accurate classification of SLE, with the added motivation of discriminating early disease. The next phases of SLE classification criteria development will refine definitions, consider hierarchical clustering of items into domains and subdomains, evaluate their independence and relationships within domains, ascertain item weights, and consider different thresholds for established SLE versus disease earlier in its course. The performance of the final SLE criteria set will then be comparatively evaluated in a multiethnic, international cohort against previous SLE classification criteria.
Footnotes
This study was jointly supported by the European League Against Rheumatism and the American College of Rheumatology. Dr. Johnson is supported by a Canadian Institutes of Health Research New Investigator Award. Dr. Khanna was supported by a grant from the US National Institutes of Health/National Institute of Arthritis and Musculoskeletal and Skin Diseases K24 AR 063120.
- Accepted for publication November 30, 2018.