Abstract
Objective. The OMERACT Ultrasound Task Force is currently developing a global synovitis score (GLOSS) with the objective of feasibly measuring global disease activity in patients with rheumatoid arthritis (RA). In order to determine the minimal number of joints to be included in such a scoring system, and to analyze the metric properties of proposed global (i.e., patient level) ultrasound (US) scoring systems of synovitis in RA, a systematic analysis of the literature was performed.
Methods. A systematic literature search of Pubmed and Embase was performed (January 1, 1984, to March 31, 2010). Original research reports written in English including RA, ultrasound, Doppler, and scoring systems were included. The design, subjects, methods, imaging protocols, and performance characteristics studied were analyzed, as well as the ultrasound definition of synovitis.
Results. Of 3004 reports identified, 14 articles were included in the review. We found a lack of clear definition of synovitis as well as varying validity data with respect to the proposed scores. Scoring systems included a wide range and number of joints. All analyzed studies assessed construct validity and responsiveness by using clinical examination, laboratory findings, and other imaging modalities as comparators. Both construct validity and responsiveness varied according to the number and size of joints examined and according to the component of synovitis measured [i.e., gray-scale (GS) or power Doppler (PD) alone or in combination]. With regard to feasibility, time of evaluation varied from 15 to 60 min and increased with the number of joints involved in the examination.
Conclusions. Ultrasound can be regarded as a valuable tool for globally examining the extent of synovitis in RA. However, it is presently difficult to determine a minimal number of joints to be included in a global ultrasound score. Further validation of proposed scores is needed.
Musculoskeletal ultrasound is primarily used by rheumatologists for detecting and assessing inflammation of joints and joint damage in rheumatoid arthritis (RA)1. Specifically, ultrasound is capable of evaluating the 2 elementary findings associated with synovitis: synovial hypertrophy (SH) and synovial fluid/effusion (SF)2. SF is visualized as an anechoic area within the joint capsule, while SH is visualized as hypoechoic material within and involving the joint capsule. Ultrasound has been shown to be superior to clinical examination in detecting and evaluating these 2 crucial components in a range of studies performed on various joints3,4. Both SF and SH are evaluated primarily on gray-scale (GS) ultrasound, while Color Doppler (CD) and Power Doppler (PD) are utilized to demonstrate activity related to SH. By visualizing the intravascular movement of blood cells, CD and PD detect microvascular blood flow in synovial and entheseal inflammation5,6.
At the single-joint level, synovitis and effusion in GS were initially evaluated by binary grading (presence/absence)7. This was followed by the creation of several semiquantitative scoring systems8,9, which rated individual synovial hyperthrophy and effusion as well as the combination thereof. Additional studies utilized quantitative measurements for evaluating synovitis in GS, based on the volume/depth of synovial tissue8,10.
Binary grading11 as well as a semiquantitative scoring system9 were also developed for evaluating vascularization of synovial hypertrophy by CD/PD. Quantitative approaches are also utilized, including the determination of the number of Doppler pixels by dedicated pixel-counting software12,13,14,15, measurement of pulsatility and resistive indices, and microbubble contrast material.
However, several factors are known to influence the sensitivity of detecting synovitis by GS and PD ultrasound. Machine characteristics and resolution, as well as varying parameter settings and the use of different transducers and presets may have significant effects on sensitivity16. Differentiating between normal and abnormal joints is further complicated by the fact that certain normal joints (e.g., the knee) may contain small amounts of SF and SH17. Doppler activity in a joint that is otherwise considered normal is less common, and is mostly due to greater intermachine variation in sensitivity with respect to Doppler13,18.
In addition to machine-dependent effects, operator-dependent factors, including factors affecting both acquisition and interpretation, have to be considered. In order to improve acquisition and interpretation, the EULAR imaging working group has established a consensual acquisition protocol19 for individual joints. Subsequently, preliminary consensus definitions for ultrasound for common pathological lesions seen in patients with inflammatory arthritis, including synovitis and intraarticular effusion, have been established by the OMERACT Ultrasound Task Force20.
Our Task Force has worked towards the development of a reliable standardized scoring system for synovitis in RA that is applicable to all joints and is consistent between machines, and which combines GS and PD in a semiquantitative 0–3 scale21,22. Results confirmed that a consensus scoring system of synovitis based on consensus definitions, combined with a standardized acquisition protocol, provided good intra- and inter-reliability21,22. In order to be able to make assumptions on global disease activity, it is necessary to move from the level of single joints to the level of the patient as a global entity. The objective of the group is therefore to propose a global ultrasound scoring system of synovitis in RA at the patient level.
However, at the moment, there is a lack of consensus regarding the optimal number of joints to evaluate and the appropriate components and scoring to use at joint level. In order to determine the minimal number of joints and the appropriate scoring system to include to correctly assess RA patients by ultrasound, we analyzed the validity of proposed global (i.e. patient level) ultrasound scoring systems according to the OMERACT filter within the framework of a systematic review of the literature.
MATERIALS AND METHODS
To extract data on ultrasound scoring systems of synovitis in RA, a systematic search of the literature was performed using a 4-step strategy: (1) Definition of the objective of the review, (2) definition of criteria of selection, (3) selection of articles, and (4) data extraction.
Selection criteria consisted of original articles involving humans, published in English between January 1984 and April 2010, and referring to binary grading, as well as to semiquantitative and quantitative scoring systems and ultrasonography/ultrasound.
Search strategy and study selection
The search of articles was performed in Pubmed and Embase. In Pubmed, article search was performed using the following key words: (Ultrasound OR Ultrasonography OR Power Doppler OR Color Doppler OR Doppler OR Musculoskeletal Ultrasound) AND (Rheumatoid Arthritis OR Inflammatory Arthritis OR Synovitis) AND (Joint count) with limits (language = English, humans only, from January 1st 1984 to March 31st 2010). In Embase, the search was performed using the following key words: (Ultrasound OR Ultrasonography OR Power Doppler OR Color Doppler OR Doppler OR Musculoskeletal Ultrasound) AND (Rheumatoid Arthritis OR Inflammatory Arthritis OR Synovitis) AND (Joint count), with limits (language = English, humans only, from January 1st 1984 to March 31st 2010). For both searches, key words referred to MeSH Terms, or if not available, to key words present in the title/abstract. Titles, abstracts and full reports of articles identified were systematically screened by one author (PM) with regard to inclusion and exclusion criteria. The final search was verified by a second author (MADA). Articles were not included if they were not in English, or studied healthy subjects, or concerned cadavers. In addition, abstracts of scientific congresses and reviews were also excluded (i.e., exclusion criteria). Further, a manual search of secondary sources including article references, reviews, and metaanalysis without limitation of date of publication was also performed.
Data extraction
During data extraction, special attention was given to the “Patients and Methods” and “Results” sections of each article. All data were extracted using a standardized template that was specifically designed for the review. Afterwards, data were collected on an Excel sheet. All selected articles were rated in order to determine the number and choice of evaluated joints in the scoring system, the characteristics of the sytem, and to evaluate the quality of the studies according to the OMERACT filter23.
Each article was analyzed and assessed in order to determine whether it fulfilled some aspect of validity. We evaluated in particular: face and content validity, construct validity, criterion validity, discriminant validity (i.e., reliability and responsiveness), and feasibility. Each of these criteria was independently evaluated in every article, including whether the methods for assessing it and their measurement were available or not. Moreover, the following characteristics, related to the acquisition and detection of synovitis, were searched for: ultrasound technique in GS and Doppler, ultrasound definition of synovitis if present, modality and components of grading method (SH, SF, or both combined, GS, PD, or both combined), grading system (binary/qualitatively, semiquantitative, quantitative).
A standardized tool for assessment of quality of the analyzed studies based on a set of 6 predefined criteria was developed and assessed in a binary mode (yes/no). These criteria were based on concepts from reviews of quality assessment tools used in systematic reviews of observational studies24. The predefined criteria were the following: (1) Was the recruitment of patients well-defined in the methods section? (2) Was the choice of number of joints to include in the scoring mentioned and justified? (3) Was there a description of the ultrasound scanning technique? (4) Was there a description of attempted blinding of observers? (5) Was there a description of synovitis scoring; Which source was this scoring based on? (6) Was the choice of comparator adequately explained and results completely given? Quality was reported on a scale of 0–6, with higher results indicating higher quality. Selected articles scored less than 1 on a scale of 0–6 were excluded from the final analysis.
Statistical analysis
Descriptive statistics were used to report data. Frequencies and percentages were used for categorical variables.
RESULTS
Figure 1 shows a flowchart of the systematic review process. Of 3004 studies identified initially, 14 articles were selected to be included in the review8,9,14,25,26,27,28,29,30,31,32,33,34,35. Table 1 shows the study design characteristics of the studies. The overwhelming majority featured a blinded design, and sample size ranged between 24 and 278 patients. Only 28% (4 of 14) of articles included control patients. All studies included clinical examination as a comparator for assessing construct validity, with all except one study including laboratory values as well. Imaging modalities (radiography, ultrasound, and MRI) were used in 43% (6 of 14) of the studies as a comparator. In 43% (6 of 14) of articles an arbitrary number of joints are chosen, while 57% (8 out of 14) based their ultrasound evaluation on available clinical indices or frequency of involvement of joints in the disease, according to literature or clinical practice.
Table 2 shows characteristics related to ultrasound examination and scoring. Definition and detection of synovitis at joint level was variable within the studies. One article included GS evaluation of global synovitis without differentiation between SH and SF, without PD evaluation, while another article evaluated only PD activity. The majority of articles evaluated both GS and PD, with GS either evaluated globally or in separate components (SH and SF) in addition to PD activity, which was evaluated separately. No study assessed a composite synovitis scoring system consisting of a combination of GS and PD. Quantification of activity also varied with all studies featuring semiquantitative scores (i.e., 0–3, 0–4, 0–5). An additional number of studies included binary or quantitative measures (e.g., thickness, resistive index, region of interest) as well. Several studies also included tenosynovitis and bursitis, but without clear definitions of lesion.
Table 3 shows the metric qualities (reliability, validity, discrimination, and feasibility) studied in the articles as part of the OMERACT filter. Regarding construct validity, correlation with clinical and laboratory findings varied according to the number and size of joints examined. Responsiveness was found to be variable according to the component tested (GS and PD) and the size of the joint. Two weeks seems to be the minimal time for visualizing minimal response (PD) and 24 weeks was found to be the best cutoff. With regard to feasibility, time of evaluation was variable (15–60 min) and increased with the number of joints involved in the examination.
The number of joints assessed by ultrasound varied between 5 and 60 joints among articles. Two joints, the second and third metacarpophalangeal (MCP) joints, were included in the synovitis scoring system of each article. Additionally, the second proximal interphalangeal (PIP) joint and the third and fourth MCP joints were assessed in 86% (12 of 14) of articles. Interphalangeal joints of the feet and the subtalar and midtarsal joints were the least commonly assessed joints, evaluated in only 14% (2 of 14) of articles. In previous studies, propositions on the number and composition of reduced joint count were based on suggested frequency of involvement25, feasibility8, or representative value of target joints9, or were developed in a logistic regression model26. Two reduced joint counts seemed to present good validity issues: the 12-joint count proposed by Naredo, et al26 and the 7-joint count by Backhaus, et al25. Joints selected in the proposed 12-joint count are wrist, MCP-2, MCP-3, knee, ankle, and elbow evaluated bilaterally26. Examining the other proposed joint counts, we found that some of them included a minimal number of 7 joints in their scoring systems and, in particular, joints featured in the 7-joint count25 combination [i.e., wrist, MCP-2, MCP-3, PIP-2, PIP-3, metatarsophalangeal (MTP)-2, and MTP-5] were included in the global synovitis scores of 50% (7 of 14) of the articles. In order to evaluate the applicability of the 7-joint count in another dataset, we analyzed data from Naredo and colleagues by using the proposed selection of 7 joints from Backhaus25. Comparative results on responsiveness by using the 2 joint counts are presented in Table 4. The use of the 7-joint count in the new dataset also showed good responsiveness; however, the application of this joint count bilaterally (14 instead of 7 joints) was characterized by higher sensitivity to change, which was closer to that observed with the evaluation of all initially evaluated joints (i.e., 44 joints).
DISCUSSION
The prospect of developing a global ultrasound joint score is attractive, in that it might potentially be able to more objectively reflect the “real” level of synovitis, and hence disease activity of patients with RA, compared with conventional clinical measures, i.e., disease activity indices. In order to be accepted as an objective tool, ultrasound must demonstrate reliability and sensitivity to change, and the evaluation of several joints must also appear feasible. This review has demonstrated that ultrasound is a worthwhile tool for assessing global joint inflammation in RA.
This review has highlighted that discrepancies were present among studies, relating to the definition and detection of synovitis, and in the composition of joints included in the global evaluation of disease activity. The variability of definition and detection of synovitis at joint level within the studies was the most important weakness raised by this review. This was most apparent in articles published before 2005, where GS definitions of synovitis, including its elementary components (SH, SF), were found to be lacking. After 2005 and the publication of the preliminary OMERACT definitions19, synovitis was defined in all articles. Moreover, the OMERACT ultrasound definitions for synovitis and elementary components were used in most articles. Regarding the evaluation of synovial vascularization, less variability was found. All articles evaluated PD, rather than CD activity. The definition of PD activity proposed by Szkudlarek, et al was adopted by almost all articles9. The quantification of synovial activity at the single-joint level was also found to be variable. Some authors focused on the quantification of GS only, whereas others quantified PD only. Generally, authors evaluated both components separately. The semiquantitative method was the most frequently used method of quantification for both GS and PD, although the scales varied. For PD, the semiquantitative scale most commonly used was that proposed by Szkudlarek, et al9 (i.e., grade 0: no flow in the synovium; grade 1: single vessel signals; grade 2: confluent vessel signals in less than half the area of the synovium; grade 3: vessel signals in more than half the area of the synovium). This high variability in the evaluation of joint activity made the comparison of studies, as well as the correct evaluation of validity, difficult.
The number of joints evaluated for creating a measure of global activity of RA and the explanation for the inclusion of joints was also found to be highly variable. In addition to providing a valid and reliable measurement, feasibility is of paramount importance with respect to ultrasound-based indices, as the examination of a large number of joints takes a considerable amount of time. Therefore, the number of joints that need to be assessed, and thus incorporated into any global ultrasound scoring system, is an important issue. Propositions on the number and composition of reduced joint counts were based either on suggested frequency of involvement14,25,30 or representative value of target joints9,31,36; or they were developed in a logistic regression model26. Validation of proposed joint scores was quite often omitted, and only 2 papers examined the metric properties of the proposed reduced joint score25,26. Independent of the metric properties of a proposed joint score, validation is still necessary, and the choice and number of joints included remains a crucial issue. Candidate target joints to be included in a global ultrasound score may also be derived from clinical disease activity indices (i.e., Disease Activity Score 28), or based on other predictive studies, for example MRI studies (i.e., wrist)37 and prediction of structural damage, or clinical prediction of severity. Considering the composition of available reduced global ultrasound scores, we found that the second and third MCP joints and the wrist were always included, regardless of how the joint score was developed. Almost all analyzed papers included the evaluation of at least 7 joints, similarly to the German ultrasound 7 score25; in addition the joint scores used in 50% of the articles included the 7 joints present in the German ultrasound 7 score. In order to test the external validity of this choice, in particular the number and the type of joints, data from the Naredo study26 were reanalyzed using the German ultrasound 7 score (Table 4). This analysis revealed that the 7 joints included were good candidates for evaluating disease activity and responsiveness, even if sensitivity to change was inferior to the 12-joint score used in that database. Responsiveness is increased if the evaluation of joints is done bilaterally. As these are the joints most frequently involved in RA, it is probable that these joints would be included in the GLOSS as well; however, in order to guarantee C-reactive protein or erythrocyte sedimentation rate responsiveness, a certain number of large joints likely need to be included, as well.
Regarding the quantification of global synovitis, a number of important questions remain unanswered. Minimal activity at the single-joint level still needs to be determined. There is currently no agreement on what constitutes a “normal” level of GS and PD findings. What appears clear is that only joint activity (i.e., inflammation of the synovial membrane by GS and PD) should be included in a future score. Indeed, the inclusion of structural damage (i.e., erosions) did not demonstrate responsiveness, as it appears in the analysis of the article by Backhaus and colleagues25. This is probably due to the duration of followup (i.e., 6 months), which is probably too short for evaluating the effect on damage. A longer followup would probably have shown better sensitivity to change, as was demonstrated by Loeuille and colleagues38. Another factor could be the choice and number of joints evaluated for erosions (MCP 2, 3, PIP 2, 3, and MTP 2, 5 unilaterally), or the disease duration of the patients (mean 8.3 yrs). Clearly, additional studies on the responsiveness of erosions should be performed for demonstrating effect on damage, and therefore sensitivity to change of erosions detected by ultrasound. The use of GS evaluation alone would probably carry the same lack of sensitivity. In fact, it is sometimes difficult to differentiate between active synovitis (i.e., hypoechoic SH) and inactive or fibrous synovial thickening (i.e., echoic, hyper-echoic SH), based only on the evaluation of echogenicity in GS, as such assessment is subjective and extremely dependent on the experience of the operator. This can also explain the greater sensitivity to change of PD signal (easier to detect), even if it is dependent on the quality of equipment. Based on this systematic review of the literature, it is difficult to suggest a minimal number of joints to score and which scoring system to use at joint level. The mathematical formulation (e.g., add all semiquantitative or quantitative scores up to produce a cumulative score) of the scoring system must also be determined. The validity of this simplified assessment, and that of others, remains to be tested and confirmed.
The OMERACT Ultrasound group is currently using the developed synovitis scoring system at joint level in an ongoing multicenter European study, in order to propose a standardized and reliable ultrasound synovitis GLOSS (Global OMERACT Scoring System). At the same time, data from this ongoing study will also be tested for responsiveness by using a different number of joints, including the 7- and 12-joint counts. This will probably permit the validation of the proposed joint count. Ultimately, the overall “usefulness and truthfulness” of GLOSS will be determined by its composition. We might well, however, need different indices for diagnosis and therapeutic monitoring.
APPENDIX
List of study collaborators: OMERACT Ultrasound Task Force: Philippe Aegerter, Sibel Aydin, Marina Backhaus, Peter V. Balint, David Bong, George A.W. Bruyn, Isabelle Chary-Valckenaere, Paz Collado, Eugenio De Miguel, Emilio Filippucci, Jane E. Freeston, Frederique Gandjbakhch, Walter Grassi, Marwin Gutierrez, Annamaria Iagnocco, Frederick Joshua, Sandrine Jousse-Joulin, David Kane, Helen I. Keen, Damien Loeuille, Ingrid Moller, Peter Mandl, Carlos Pineda, Lene Terslev, Wolfgang A. Schmidt, Marcin Szkudlarek, and Hans-Rudolf Ziswiler.