Abstract
Objective. As part of the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) ultrasound working group, we performed a systematic review of the literature to assess the evidence and knowledge gaps in scoring instruments of enthesitis in psoriatic arthritis (PsA).
Methods. A systematic search of PubMed, EMBase, and Cochrane databases was performed. The search strategy was constructed to find original publications containing terms related to ultrasound, enthesitis, spondyloarthritis (SpA) or PsA. Data extraction focused on the properties of the sonographic enthesitis instruments used in each study following components of the Outcome Measures in Rheumatology (OMERACT) filter: feasibility, test-retest reliability, construct validity as related to clinical assessment of enthesitis, biomarkers of inflammation and imaging of enthesitis by other modalities, discriminative validity, and responsiveness to treatment.
Results. Fifty-one of 310 identified manuscripts were included. Only 1 scoring instrument of enthesitis was specifically developed and validated in patients with PsA. Only 18 (35%) of the studies involved patients with PsA, while the remaining studies focused on SpA. In PsA, construct validity was assessed using biomarkers and clinical examination in 1 (2%) and 11 (21.5%) of the studies, respectively, whereas no studies used imaging for the same purpose. Only 2 (4%) of the studies assessed discriminative validity in PsA. Responsiveness to treatment was assessed in 7 studies, none of which included patients with PsA.
Conclusion. Although sonographic enthesitis scoring instruments have been developed for SpA, only a few have been validated in PsA. None of them passed the OMERACT filter in patients with PsA. Additional research is required before endorsing a specific instrument for the assessment of enthesitis in patients with PsA.
Enthesitis, the inflammation of the insertion of tendon, ligament, and capsule into the bone, is a prominent feature of spondyloarthritis (SpA), including psoriatic arthritis (PsA). The evaluation of enthesitis is conventionally conducted by clinical examination, a method with significant limitations, including low sensitivity and specificity. Imaging modalities including ultrasound (US) and magnetic resonance imaging (MRI) have gained interest in enthesis evaluation. US can identify abnormalities at the enthesis in high fidelity and may assist with the diagnosis and management of patients with SpA1.
In 2014, the Outcome Measures in Rheumatology (OMERACT) US special interest group reached a consensus regarding the sonographic elementary lesions defining SpA-related enthesitis. The following sonographic lesions at the enthesis were included: hypoechogenicity (loss of fibrillar architecture), thickening (compared to the body of the tendon), calcifications, enthesophytes (step-up of bony prominence), bone erosions (step-down with cortical break), and Doppler signal2. This was an important first step toward ensuring a high degree of consistency across studies using US to assess enthesitis. However, while this exercise defined the concept of sonographic enthesitis at the level of any given enthesis, it did not address the issue of evaluating the extent of enthesitis at the patient level. In other words, it provided standard definitions for evaluating the presence of enthesitis at a specific site, such as Achilles tendon, but it did not provide a tool that can help the physician in quantifying the burden of entheseal involvement in a patient with PsA.
Several sonographic enthesitis instruments have been developed, mostly in patients with axial SpA (axSpA), to quantify the extent of enthesitis at the global patient level. Glasgow Ultrasound Enthesitis Scoring System (GUESS) assesses 5 entheseal sites in the lower extremities. The original GUESS does not include power Doppler vascularization3. The score developed by D’Agostino includes the assessment of sonographic enthesitis at 10 sites in the upper and lower extremity sites as in GUESS4. Sonographic Entheseal Index (SEI) involves the assessment of the same 5 entheseal sites as in GUESS but includes a distinction between chronic entheseal lesions, such as erosions and calcifications, and acute entheseal lesions, such as increased thickening and hypoechogenicity5. The Madrid Sonographic Enthesitis Index (MASEI) is a weighted score that assesses 6 entheseal sites. MASEI assigns higher scores to erosions, larger enthesophytes, and Doppler signal compared with other elementary lesions. The Belgrade Ultrasound Enthesitis Score (BUSES) evaluates 6 sites6. Lastly, Ultrasound composite scores for the assessment of inflammatory and structural pathologies in PsA (PsASon) score is not exclusively an entheseal score, albeit a composite score that also includes joints. This score includes only 2 sites, the common extensor tendon at the lateral epicondyle and the insertion of the distal patellar tendon7.
The instruments described above have been increasingly used in studies evaluating entheseal abnormalities, though they may have some limitations. All but 1 were developed and validated in patients with predominantly axSpA, thus their validity in patients with PsA is unknown. Further, entheseal sites were chosen based on expert opinion and most of the included sites are in the lower extremities, which are more prone to mechanically related enthesopathies, especially in overweight patients8. Currently, there is limited information on the effect of confounding factors that are prevalent in patients with PsA, such as obesity and mechanical stress, on the performance of these scoring systems in psoriatic patients.
US could be used to quantify the extent of enthesitis for diagnostic purposes, patient management, and monitoring treatment response in clinical trials, observational studies, and clinical practice9. Implementing a treat-to-target approach in patients with PsA requires an accurate evaluation of disease activity in all core domains, including enthesitis. However, this purpose requires validated instruments (outcome measures) for patients with PsA. Therefore, the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) US working group performed this systematic literature review (SLR) to evaluate the current evidence and knowledge gaps in instruments for the assessment of enthesitis in PsA.
Our aims in this SLR were first to describe the measurement properties of the available sonographic enthesitis instruments particularly used in patients with PsA; secondly, to evaluate the validity of the available scoring systems according to the OMERACT filter measurements10. Lastly, we aimed to critically appraise the quality of the studies on different scoring systems in PsA. The results of this SLR will inform GRAPPA about the validity of existing scoring systems for assessment of sonographic enthesitis in PsA and determine whether a new scoring system for the assessment of enthesitis is warranted.
MATERIALS AND METHODS
Literature review: Data sources and search strategies
We searched Medline, EMBase, and Cochrane Central Register databases from their inception (1966, 1980, and 1982, respectively) to January 3, 2017, using a strategy designed by an experienced medical librarian (MA) to find primary references. The search strategy was constructed to find publications containing at least 1 term from each of 3 search blocks: (1) The terms psoriasis, psoriatic arthritis, spondyloarthritis, spondyloarthropathy, or ankylosing spondylitis; (2) Enthesitis, enthesopathy, enthesis, or entheses and in addition to tendon, synonyms were included; (3) Ultrasound, ultrasonography, sonography, or Doppler. The search was limited to English publications in humans.
Studies selection
Titles and abstracts of articles were systematically screened by 2 reviewers (SBU and OE) regarding inclusion and exclusion criteria. Selected publications were retrieved in full, and 2 reviewers (SBU and OE) independently assessed them for eligibility. The final search was verified by a third author (LE). Additional papers were obtained by scanning the references of the selected articles. To be included in the systematic review, original studies needed to fulfill the following inclusion criteria: study design (case-control, cross-sectional, or cohort); population (studies that assessed patients with SpA, PsA, or psoriasis); outcome (studies that evaluated sonographic enthesitis at the patient level). Studies that evaluated only 1 entheseal site and those that used 3-D US were excluded.
Data extraction
Data were independently extracted by 2 authors (SBU and OE) according to a standardized form and summarized in tables. Discrepancies were resolved by consensus and involvement of a third author if needed (LE). For each study the following information was recorded: year of publication, study design, study population, sample size, the mean age, body mass index, disease duration, sex distribution, US machine, US settings, sonographic entheseal scoring system used, entheseal sites assessed, and sonographic elementary lesions assessed.
Appraisal of measurement properties of included studies
Feasibility was assessed as the time to complete the examination. Reliability (test-retest) was considered positive if common measures for interrater and intrarater reliability were measured and were found to be with moderate to high agreement [κ > 0.4 or intraclass correlation coefficient (ICC) > 0.6]11. Construct validity was achieved when US evaluation of enthesitis significantly correlated with each the following 3 theoretical concepts of enthesitis: (1) clinical enthesitis as assessed on physical examination using an established clinical enthesitis score (e.g., Leeds Enthesitis Index); (2) laboratory biomarkers of inflammation, such as C-reactive protein (CRP) and erythrocyte sedimentation rate (ESR); (3) other imaging modality assessing enthesitis, such as radiographs or MRI. Responsiveness was evaluated by the ability of the instrument to measure change in response to an intervention (e.g., study drug) when a change has occurred (based on an external construct). Discriminant validity was considered positive if a strict cutoff was found to significantly distinguishing disease (e.g., PsA or SpA) from healthy controls.
Quality assessment of identified studies
The risk of bias and applicability were assessed using QUADAS-212. This tool consists of 4 domains: patient selection, index test, reference standard, and flow and timing of the index test. Each domain assesses the risk of bias (e.g., patient selection, risk of bias related to the conduct or interpretation of the results of the external construct, and the index tests). In addition, the applicability regarding patients, external construct, and the index tests is assessed. In each domain, the risk of bias and the concerns regarding applicability are scored independently (low, high, or unclear). We illustrated the process as recommended in the Preferred Reporting Items for the Systematic Reviews and Meta-Analyses (PRISMA) statement13.
RESULTS
Literature search
Figure 1 is a flowchart of the article selection. The initial literature search retrieved 310 abstracts. After an initial screening of abstracts, 118 full text manuscripts were chosen for further review. After reviewing the full text manuscripts, 67 publications were excluded for the following reasons: 33 evaluated only a single entheseal site, 12 were the wrong study type (e.g., review, case report), 9 studies did not provide sufficient data regarding the scoring system used, 6 had irrelevant study populations, 2 did not assess entheses, and for 5, full text was not available. A total of 51 studies were included in the manuscript2,3,4,5,6,7,14–59.
Study characteristics
The characteristics of the studies are summarized in Table 13,4,5,6,7,14–59. The study designs were 38 cross-sectional and 13 prospective cohort studies. The study population was divided as follows: 18 (35%) assessed patients with PsA, 17 (33%) assessed patients with SpA, 10 (19.6%) examined patients with ankylosing spondylitis (AS), 5 (9.8%) examined patients with psoriasis, and 1 assessed juvenile idiopathic arthritis. Some studies used scoring methods that were not previously validated, although often these scoring methods used entheseal sites and elementary lesions similar to those in validated scores. Therefore, we aggregated these studies along with the studies that used the formal validated instruments. The following sonographic entheseal scores or their modifications were used: 14 (27.4%) GUESS score, 9 (17.6%) MASEI, 6 (11.7%) for the score used by D’Agostino, 4 (7.8%) BUSES, 3 (5.8%) SEI, and 1 (2%) PsASon-score. The entheseal sites and the elementary lesions evaluated in the studies are presented in Table 2. Positioning of the patient during enthesis scanning, as described in the study protocols, was almost uniform. All scores (MASEI, GUESS, SEI, BUSES, PsASon) assessed the quadriceps tendon, proximal and distal patellar tendon while the knee is flexed, and the Achilles and plantar fascia while the patient is prone and the foot is overlying the bed. MASEI appreciate the triceps tendon while the arm is flexed and the tendon is stretched. The score by D’Agostino did not specify a standardized positioning. No major differences in limb positioning were documented. The majority of the studies (88.2%) used instruments that included Doppler evaluation.
Assessment of measurement properties of included studies following the algorithm of the OMERACT filter.
Feasibility was reported in 8 (15.6%) studies4,6,7,17,26,28,33,48. Five (9.8%) assessed patients with PsA7,26,28,33,48, and the time range for assessing entheseal involvement with US was reported between 15 to 90 min.
The results of the evaluation of the various components of the OMERACT filter are presented in Table 3. Concerning the OMERACT filter, reliability was assessed in 28 (54%) studies 3,4,5,6,7,18,19,22–26,28,30,33,35,39–41,44,45,47–49,51,54,56,57. Ten (35.7%) of them included patients with PsA7,18,26,28,33,44,47,48,54,56. Reliability metrics including κ and ICC were used and showed moderate to excellent correlation, although reliability assessed mostly reading of the US images and not the acquisition process.
The construct validity of the various sonographic instruments as related to clinical examination of enthesitis was reported in 26 (51%) studies3,4,5,7,16–19,21–23,25–28,30,32,33,36,37,44,53–56. In 9 (34.6%) of them, positive statistically significant correlation was found17,18,28,30,36,37,53,55,56. Only 11 (21.5%) studies compared sonographic enthesitis findings to clinical examination of enthesitis in patients with PsA29 and only 3 (27%) had demonstrated a positive correlation28,29,56. The construct validity of the various scoring systems as related to biomarkers of inflammation (CRP and/or ESR) was assessed in 10 (19.6%) studies3,5,7,19,20,22, 23,34,43,53, and in only 3 (30%) of them, statistically significant positive correlation was found19,20,34. Only 1 (2%) study assessed it in PsA and did not find a significant association7. The construct validity of the various scoring systems as related to other imaging modalities was evaluated in only 6 (11.7%) studies36,37,38,52,53,55. In 4 (66%) of those studies, positive significant correlation was found36,38,52,53, and none of the studies evaluated this topic in patients with PsA. Discriminative validity, as defined by the ability of certain cutoff values to distinguish between disease states (e.g., remission vs active disease) or disease status (PsA vs control) was assessed in 6 (11.7%) studies6,24,27,48,50,51. Only 2 (4%) of them were done in patients with PsA27,48. The responsiveness of the various sonographic scores to treatment, defined as a statistically significant change in the score in response to an intervention, was evaluated in 7 studies (13.7%)5,19,23,25,34,42,59; however, none of them was conducted in patients with PsA. In 5 (71%) studies, responsiveness was found5,19,23,25,34.
Quality assessment
The QUADAS-2 tool items are summarized in Table 4. In 21 studies, there was a low risk of bias and applicability concern3,4,6,7,14,16–20,22,28,31,33,34,41,48,50,51,52,56. In 22 studies there was unclear risk of bias mostly due to lack of details related to the recruitment method and limited description of the flow and timing of patient recruitment21,23,24,27,30,32,35–40,42,44,45,47,49,53,55,57,58,59; in 3 studies there was unclear risk of applicability concern due to comparison of the US results to an uncommon reference standard36,60,61. High risk of bias due to unblinding of the sonographer to the clinical results was reported in 2 studies15,26. High risk of bias in patient selection was present in 1 study that included a highly selective study population29. In 6 studies, high risk in applicability concern was assumed because of a highly selective population or inclusion of a less relevant population for this review5,15,25,43,46,54. Because of the descriptive character of this review, studies identified as having high risk of bias were not excluded.
DISCUSSION
Enthesitis is a key clinical and pathophysiologic feature in PsA and is included in the OMERACT PsA core domain set, which warrants the evaluation of enthesitis in every clinical trial and observational study. The inherent limitations in clinical evaluation of enthesitis led to a growing interest in the use of musculoskeletal US to improve the precision of enthesitis evaluation. This SLR represents a critical examination of the published data regarding the state of validation of the most commonly used sonographic enthesitis instruments in PsA and SpA. The study identified significant limitations related to the lack of standardization of existing instruments and major gaps in knowledge about their validity as outcome measures for assessment of enthesitis patients with PsA.
Several sonographic instruments have been developed to provide a global estimation of the extent of enthesitis at the patient level. The present SLR critically evaluated the properties and the validity of available sonographic enthesitis scoring systems. We highlight several important limitations and gaps in knowledge related to the validity of these outcome measures in patients with PsA. One of the important issues is that only about a third of the studies included in this SLR focused on patients with PsA, so we decided to extend the study population to also include patients with SpA. All of the instruments except 1 were originally developed and validated in patients with axSpA and their use was subsequently applied to PsA. This is an important limitation, because the distribution of enthesitis in patients with PsA may be different from that in patients with axSpA. Additional important limitations are the lack of standardization regarding the number and location of entheseal sites included in each score, the variation in the elementary lesions, and their weight in the total score. These issues complicate the direct comparison of the performance of available instruments. Additionally, the development process of existing instruments was primarily based on experts’ opinion rather than data-driven and the initial validation process was based on a small sample of patients (< 50 patients in the 2 most commonly used instruments).
One of the important issues noted in the SLR is the wide variation in the entheseal sites included in each instrument. In fact, apart from the studies that used the 6 established sonographic scores, we included in this SLR 14 studies that used ad hoc enthesitis scoring systems. These studies used various combinations of entheseal sites that were different from those included in previously validated methods. The selection process of entheseal sites included in each instrument was primarily based on expert opinion. To date, no study has investigated the optimal combination of entheseal sites to represent the construct of “enthesitis” in PsA.
The majority of the entheseal sites included in the sonographic scores are located in the lower extremities, an area that is heavily affected by biomechanical stress and thus could be confounded by aging, physical activity, and obesity. Two scores (GUESS and SEI) include only sites in the lower extremities (quadriceps, patella, Achilles, and plantar fascia) while others (MASEI and BUSES) include only a single upper extremity site (triceps and common extensors respectively). The score developed by D’Agostino is the only one that uses 2 upper extremity sites (common extensors and common flexors). Entheseal sites around the shoulders and unconventional entheseal sites, such as those around the fingers or functional enthesis sites (e.g., tibialis posterior around the medial malleolus), are not included in any score.
There is no consensus on which elementary lesions define acute/active enthesitis and which represent chronic/irreversible entheseal damage. However, previous studies have largely considered the presence of power Doppler signal at the enthesis as an indicator of active enthesitis, while irreversible lesions such as enthesophytes, erosions, and calcification represent damage from previously active enthesitis or enthesopathy due to noninflammatory reasons. Most of the instruments do not differentiate between acute and chronic lesions but instead summarize the scores of all lesions together to a general score. This limits the ability of the instruments to distinguish between active versus inactive disease states and to assess treatment response.
None of the instruments graded the degree of Doppler vascularization. With the availability of US machines with highly sensitive Doppler, semiquantitative grading of the degree of Doppler vascularization may be more appropriate. Because the scanning position (e.g., relaxed or stretched tendon) may affect the ability to detect Doppler signal, such standardization is important to reduce variability in results. The optimal positioning of the enthesis for Doppler evaluation is when the tendon is in a relaxed position.
The issue of the borders of the enthesis was defined by the OMERACT US group as up to 2 mm from the enthesis2. However, many of the studies, particularly the validation work for the commonly used enthesitis scores, were published prior to the publication of this definition. Regarding the thickening of the enthesis, although the OMERACT definition does not consider specific cutoff points for each entheseal site, 3 of the commonly used scoring systems (GUESS, MASEI, and PsASon3,7,51) used the same cutoff points to defined thickened entheses. The remaining scoring systems did not define what was considered thickened entheses. The lack of clear and acceptable sonographic definition of the borders and dimensions of the normal enthesis adds to the variability between scoring systems.
Considering the validity of the existing instruments in PsA according to the OMERACT filter, significant gaps in knowledge are highlighted. First, only a minority of studies assesses solely patients with PsA and not the general SpA population. Construct validity (correlation between sonographic enthesitis and theoretical concepts of enthesitis) was evaluated primarily in relation to clinical assessment of enthesitis. As expected, there was a relatively poor correlation between sonographic and clinical enthesitis representing the higher sensitivity of US as well as the mixed active and inactive sonographic lesions included in the scoring systems. Limited information exists about the construct validity of existing instruments against laboratory markers of inflammation and other imaging modalities, especially in patients with PsA.
The responsiveness of existing sonographic scores to treatment is an area with sparse data especially in PsA, where there were no studies assessing it. Out of 7 studies (in patients with ankylosing sponylitis or SpA), 5 showed good correlation between treatment and global improvement of the sonographic score. It is possible that similar responsiveness exists in PsA as well; however, as was mentioned, no study to date evaluated this aspect. Concerning the discrimination ability of the sonographic scores, there is also scant data regarding the ability of a score to discriminate between sick and healthy populations, with only 2 studies assessing this issue in PsA. In 1 study, US was found to be a useful tool in differentiation between PsA and fibromyalgia27, and the second study found US a valuable tool in discriminating between PsA, psoriasis, and healthy controls48. A common situation that often arises in the clinical aspect is whether the patient with small joints involvement of the hands has rheumatoid arthritis, PsA, or osteoarthritis; unfortunately, the current evidence does not support any sonographic entheseal score to assist in this dilemma.
It is worth noting that it is not expected that each study will assess all measurements of the OMERACT filter. For instance, in a study assessing responsiveness to treatment, construct validity would probably not be evaluated. However, one would expect from a comprehensive instrument’s score being used for assessing certain variables, such as responsiveness to treatment, to have a proper validation process supporting its use.
Sonographic entheseal instruments that assess the extent of enthesitis at the global patient level have progressed in recent years. Some of these instruments have been validated in patients with SpA; however, the validity of these tools in PsA is largely unknown. There is a need for a well-validated instrument for assessment of sonographic enthesitis in PsA that includes the unique features of PsA and will assist in diagnosis, disease burden quantification, clinical decisions, and prognosis.
Footnotes
Dr. Eder is supported by a New Investigator Salary Grant from the Arthritis Society and the Canadian Association of Psoriasis Patients.
- Accepted for publication May 7, 2018.