Abstract
Objective. Intra- and interreader reliability, construct validity, and responsiveness of the Spondyloarthritis Research Consortium of Canada (SPARCC) magnetic resonance imaging (MRI) scoring system were investigated for scoring sacroiliitis in patients with juvenile spondyloarthritis (JSpA)/enthesitis-related arthritis (ERA) who have received biologic and/or nonbiologic treatment.
Methods. Ninety whole-body MRI examinations with dedicated oblique coronal planes of the sacroiliac joints in 46 patients were independently reviewed and scored by 2 pediatric musculoskeletal radiologists, blinded to clinical details, using the SPARCC system. Intra- and interreader reliability was assessed by intraclass correlation coefficients (ICC). Construct validity testing was done by (1) correlating the SPARCC MRI scores of sacroiliitis with clinical disease activity indicators (cross-sectional validity), and (2) correlating the change in the MRI score with the change in clinical indicators before and after treatment (longitudinal validity). Responsiveness of the MRI and clinical indicators was also evaluated, grouped by biologic and nonbiologic treatment.
Results. When applied in children with JSpA/ERA, the SPARCC showed almost perfect intra- and interreader reliability (ICC 0.79–1.00). There was poor cross-sectional and longitudinal correlation between clinical assessment indicators and MRI scoring. SPARCC scores showed higher responsiveness to treatment-related change than most clinical outcome measures. Three clinical outcome measures correlated longitudinally with SPARCC score in nonbiologic treatment: active joint count (r = 0.72, p < 0.001), FABER (Flexion, Abduction, External Rotation) test (r = 0.58, p = 0.012), and physician’s global assessment (r = 0.61, p = 0.034).
Conclusion. The SPARCC MRI scoring system is a reliable tool with relatively higher responsiveness than clinical indicators and is suitable for objective quantification of sacroiliitis when applied to pediatric patients with JSpA/ERA.
- SPONDYLOARTHRITIS RESEARCH CONSORTIUM OF CANADA
- SACROILIITIS
- MAGNETIC RESONANCE IMAGING
- SCORING SYSTEM
- JUVENILE SPONDYLOARTHRITIS
- ENTHESITIS-RELATED ARTHRITIS
Juvenile spondyloarthritis (JSpA) affects the peripheral and axial skeleton and has a strong genetic association with HLA-B27. It is most commonly referred to as enthesitis-related arthritis (ERA), a subtype of juvenile idiopathic arthritis (JIA) as defined by the International League of Associations for Rheumatology (ILAR) classification for childhood arthritis1. ERA accounts for about 15–20% of JIA cases2,3,4, with a peak age of disease onset at 11.7 years5 and predilection for male sex6. JSpA/ERA often have a positive family history for HLA-B27–associated disease including reactive arthritis, ankylosing spondylitis (AS), inflammatory bowel disease, psoriasis or psoriatic arthritis, and acute iritis or uveitis7.
Peripheral arthritis and enthesitis usually affect the lower extremities and are the most common presentations in JSpA/ERA8. Axial involvement of the sacroiliac (SI) joints and spine is uncommon at disease onset9,10 but may develop within 5 to 10 years in up to 40% of JSpA/ERA patients6,11 and is associated with higher morbidity, pain scores, and progression to AS12,13. Clinical symptoms of axial involvement are nonspecific but can include lumbar or buttock pain. Clinical sacroiliitis is elicited by pain with direct palpation or through provocative maneuvers, such as the FABER (Flexion, Abduction, External Rotation) or Gaenslen’s tests. Magnetic resonance imaging (MRI) is far more sensitive than radiographs in the detection of SI inflammation and structural damage14,15,16. Early detection of sacroiliitis is important for diagnosis and institution of appropriate management, especially with biologic therapies [anti-tumor necrosis factor (TNF) agents], to ensure improved patient outcomes and quality of life17.
MRI offers a more comprehensive evaluation of anatomical structures in the SI joints and spine, along with early detection of asymptomatic inflammatory lesions, and is unattainable with physical examination alone. This has been demonstrated in patients with JSpA/ERA and has been helpful in providing evidence of sacroiliitis in adolescents with a negative history of back or buttock pain and a normal axial physical examination18. Various MRI scoring methods exist to quantitatively evaluate the inflammation and bone marrow edema (BME) in patients with sacroiliitis19,20,21,22. These scoring systems differ regarding the MRI planes and sequences used to detect inflammation, a more precise quantification for unit of interest (SI joint divided into halves or quadrants), number of slices used to score, global versus more extensive grading, and the site of the inflammatory lesion to be scored23. Among these, the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system is the most advantageous because it is both comprehensive and objectively scored according to a standardized measurement protocol21. While other scoring systems account only for the presence or absence of BME, the SPARCC scoring system incorporates 2 other MRI indices of potential clinical significance: signal intensity and the 3-D extent of inflammation24.
The validity and reliability of the SPARCC scoring system has been shown in adult SpA and AS populations with sacroiliitis, and thus is one of the most accepted and used worldwide. Unfortunately, no similar studies have yet been conducted in the pediatric population to provide objective measurements for diagnostic purposes and disease activity monitoring. In our study, we investigated the intra- and interreader reliability of the SPARCC scoring system in children and adolescents with sacroiliitis in the setting of JSpA/ERA. This study also included exploratory analysis to determine trends in correlation between the SPARCC scores and clinical variables of disease activity.
MATERIALS AND METHODS
This cross-sectional retrospective study was undertaken at a large pediatric quaternary referral center and approved by the institutional research ethics board (no. 1000044486). The consent from all patients was waived owing to the retrospective features of the study.
Patient selection
All patients included were under the age of 16 years, had a confirmed or suspected clinical diagnosis of JSpA (European Spondyloarthropathy Study Group criteria) and/or ERA (ILAR criteria), and had a whole-body (WB) MRI performed between January 2008 and December 2016. Electronic medical records identified 70 eligible patients. Of these, 24 patients were excluded because of inadequate or incomplete MR examinations attributed to motion artifacts or absence of dedicated oblique coronal of SI joints necessary for SPARCC scoring evaluation. A total of 90 MRI studies were analyzed for the 46 patients included in the study; 15 patients underwent more than 1 MRI study while receiving treatment. Demographic, clinical, and laboratory data were also collected for the 46 patients enrolled. For 15 patients, pretreatment and posttreatment clinical and laboratory data were collected.
MRI protocol
All MRI studies were done as part of a WB-MRI protocol that we use for the evaluation of JSpA/ERA. All examinations were performed on a 1.5-T MRI system (Magnetom Avanto, Siemens), using a dedicated multichannel surface coil system with the patient supine. Integrated head, neck, spine, body, and peripheral angiography surface coils were used for contiguous scanning. Images were acquired at multiple stations with the patient free-breathing and were subsequently reconstructed using the vendor-specific software package (Siemens Composing, Siemens) to form a WB image. All examinations included oblique coronal short-tau inversion recovery (STIR) imaging (repetition time msec/echo time msec of 2250/69 with a field of view of 25 cm) of the SI joints as a part of the WB-MRI. Scans were obtained in an oblique coronal plane parallel to the long axis of the sacral vertebrae (S1–S2) using 4 mm of slice thickness with a slice gap of 4.5 mm.
MRI analysis
The SPARCC scoring system quantifies BME within the iliac and sacral bones along the SI joint by using the corresponding increased MRI signal detected on STIR images. Quantification is performed on 6 consecutive oblique coronal slices through the SI joint, encompassing most of the cartilaginous and synovial portion of the joint that have at least 1 cm of visible vertical height. Each SI joint is divided into iliac and sacral portions. When the vertical height of the SI joint measures > 3 cm, each half is further divided into 2 quadrants, upper and lower (Figure 1). As a result, a maximum of 48 quadrants of SI joints are created within 6 MRI slices. This rule is applied to the anterior and posterior aspects of the SI joints. However, at the posterior aspect, the joint is naturally divided into upper and lower quadrants by intervening fat and fibrous tissue (Figure 1). Hence BME within the ligamentous portion was not scored (Figure 2 and Figure 3). Three MRI indices were evaluated. First, the increased STIR signal (BME) was evaluated: the presence of bone marrow edema was identified based on comparison with signal from the center of the sacrum at the same craniocaudal level, because this site is less prone to fatty change or inflammation (Figure 2). If the center of the sacrum was not visible on image, then the adjacent section that provided a reference for comparison at that level was used. One point was granted per quadrant with increased STIR signal, for a maximum score of 48. Second were lesions exhibiting intense signal (intense edema); the adjacent presacral vein signal was used as a reference (Figure 2). One point was granted per SI joint per slice for a maximum score of 12. Finally, lesion depth (extensive edema) was evaluated — a lesion demonstrating continuous high signal and presenting with a depth of 1 cm or greater from the articular surface measured perpendicular to the vertical axis of the joint (Figure 2). One point was granted per SI joint per slice for a maximum score of 12. The maximum total score was 72 as demonstrated in the scoring sheet (Supplementary Figure 1, available with the online version of this article)21,23.
The MRI studies were independently scored using the SPARCC system by 2 pediatric musculoskeletal radiologists, who were blinded to the clinical information of the study patients. Each radiologist performed the task twice with a 6-week interval between the first and second round of readings. A single SPARCC score of 1 was taken as nonspecific BME. Only a score of 2 or greater was considered to have radiologic evidence of sacroiliitis (Figure 3).
Clinical data acquisition
There are no clinical diagnostic scoring systems or clinical maneuvers that are highly sensitive and specific to represent sacroiliitis. However, in addition to palpating SI joint tenderness, focused physical examination measures that target clinical assessment of the SI joints include the Schober and FABER tests25,26,27. Further, surrogate assessments of global disease activity in JSpA/ERA can be obtained by 2 commonly used methods. The first method is the physician’s global assessment of disease activity (PGA), which represents both peripheral and axial arthritis. It uses a visual analog scale (VAS) with a range from 0 to 10, where 0 is no disease activity and 10 reflects severe disease activity. The VAS is filled out and based on the expert clinical opinion of the physician after considering all aspects of the patient’s arthritis including joint symptoms, physical examination, laboratory investigations, physical function questionnaires, and imaging as available. The second method is the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) method, which is a validated patient self-administered disease activity questionnaire. It is a nonspecific instrument but is a responsive clinical tool in SpA. The questionnaire studies disease activity domains including fatigue, spinal pain, joint pain or swelling, point tenderness (enthesitis), and duration/severity of morning stiffness. All available surrogate clinical outcome measures were retrieved and analyzed, which also included active joint count, swollen joint count (SJC), clinical evidence of sacroiliitis (SI joint tenderness on palpation, Schober test, or FABER test), enthesitis count, SpA hip measurements (abduction and internal rotation by intermalleolar distance), and SpA questionnaires [BASDAI and the Bath Ankylosing Spondylitis Functional Index (BASFI)].
Statistical analysis
One-way random effects, absolute agreement, and single-rater version of the intraclass correlation coefficients (ICC) were calculated to assess both intra- and interobserver agreement in SPARCC radiologic scoring of sacroiliitis. Spearman correlation coefficients were used to assess the cross-sectional and longitudinal change score agreement between radiologic (i.e., SPARCC) and clinical (i.e., PGA and BASDAI) scores of disease activity. The ICC and Spearman correlation coefficients agreement was interpreted in accordance with common guidelines: ≤ 0.40 indicating poor, > 0.40 to ≤ 0.60 moderate, > 0.60 to ≤ 0.80 substantial, and > 0.80 excellent28,29. Responsiveness of patients to treatment through SPARCC and clinical outcome measures was calculated by standardized response mean (SRM). SRM is the ratio of observed change and the SD reflecting the variability of the change values (SRM = mean of differences between the baseline and followup values/SD of these differences)30. Change scores were calculated by subtracting final score from initial score to produce positive treatment-related change, as per convention. All analyses (descriptive, ICC, and Spearman) were performed using SAS 9.4 (SAS Institute).
RESULTS
Study sample
Forty-six patients with JSpA, 89.1% of whom had ERA, were included in our study (Supplementary Table 1, available with the online version of this article). The patients’ mean age at disease onset was 12.7 years; 47.8% of patients were positive for HLA-B27. MRI evidence of sacroiliitis defined as a score of ≥ 2 was noted in 24 of the 46 (52.2%) baseline (46 = first available MRI) MRI studies analyzed.
Reliability of the SPARCC system
Radiologic SPARCC scoring of sacroiliitis (overall and MR index–specific scores) demonstrated almost perfect agreement (as interpreted by ICC) within and between readers (results summarized in Table 1). Inter- and intrareader reliability of the intensity of SI joint marrow signal was lower than those of presence and extent of BME. The range of item scores given for the presence of BME was much larger than the range of item scores given for presence of intense edema and extent of edema, hence these were not comparable.
Construct validity between SPARCC and clinical scores
Construct validity is the ability of a scale to determine the extent to which a particular measure relates to other measures in a manner that is consistent with theoretically derived hypotheses about the constructs being measured31. In the retrospective chart review, there were no definitive clinical outcome measures that could confirm a diagnosis of sacroiliitis, hence an exploratory analysis was performed to test for correlations between the SPARCC scores and several surrogate clinical outcome measures used for sacroiliitis. Cross-sectional single timepoint assessment showed that for targeted binary outcome measures of sacroiliitis including the FABER and Schober tests as well as the SI joint tenderness, SPARCC scores were significantly different depending on the presence or absence of these outcomes (Table 2). None of the clinical measures other than hip abduction measurement and C-reactive protein correlated significantly with SPARCC MRI scores (Table 2). Other tested clinical outcome measures including active joint count, SJC, PGA, BASDAI score, BASFI score, and Childhood Health Assessment Questionnaire (CHAQ) score did not correlate with SPARCC MRI scores, as shown in Table 2.
Responsiveness of patients’ joints to treatment as measured by SPARCC scores and clinical outcome measures
Two or more treatment-interval MRI examinations of the SI joint were available in 15 of the 46 patients (32.6%) included in the study, of whom 5 were treated only with biologic agents (TNF-α inhibitors), 6 only with nonbiologics (naproxen, methotrexate, sulfasalazine, etc.), and 4 with both nonbiologic and biologic therapies. In total, up to 17 intervals of nonbiologic treatments and 18 intervals of biologic treatments were analyzed pairwise, excluding the cases with missing clinical outcome measures. The SPARCC MRI scores could indicate greater response of patients’ joints to treatment than could most clinical outcome measures, including active joint count, SJC, Schober and FABER tests, SI joint tenderness, and PGA scores (as shown in Table 3). Hip abduction showed similar or slightly better responsiveness (by SRM) than did the SPARCC MRI scoring system in both treatment cohorts, but the correlation change scores in these 2 measures were nonexistent. Enthesitis count, BASDAI, and BASFI showed varying SRM compared to SPARCC MRI score by type of treatment, also with no correlation of change. CHAQ, parent’s assessment, and laboratory measures were not available close to MRI in the majority of cases and therefore could not be assessed. There were no significant correlations between the change in MRI scores versus the change in clinical measures, except for active joint count (r = 0.72, p < 0.01) and the FABER test (r = 0.58, p = 0.01) in the nonbiologic group, suggesting suboptimal MRI-to-clinical correlation in the magnitude and/or direction of change. Among the 3 indices, the BME index generally showed higher responsiveness compared to the intensity and extent indices (Table 1).
For the 15 selected patients with followup MRI scans, there was a median period of 10.3 months (0.0–24.8 mos) from the time of baseline MRI to the start of biologic treatment, and a median period of 10.3 months (3.7–14.3 mos) from the time of MRI to the time of biologic treatment change. Clinical data about biologic treatment dates were not available for 1 patient who was followed at a local community clinic. For nonbiologic treatments, there was a median period of 14.8 months (0.0–78.1 mos) from the time of baseline MRI to the start of nonbiologic treatment, and a median period of 22.6 months (15.5–37.0 mos) from the time of MRI to the time of nonbiologic treatment change. Because of the limited sample size, we cannot conclude any differences between the biologics and nonbiologic treatment groups.
DISCUSSION
The SPARCC scoring method was chosen for the evaluation of sacroiliitis in our study, because it has been validated for use in adults with AS or SpA and is more comprehensive than other scoring methods. This comprehensiveness is important in the evaluation of enthesitis; the anatomical concept of the “enthesis organ” encompasses the functionally associated structures near the enthesis itself, such as bone, insertional fibrocartilage, bursae, and synovial-covered fat pads. Beyond BME, MRI can record the intensity and 3-D extent of inflammation, as well as reflect the inflammatory changes near the entheseal insertion points.
The results of our study have confirmed the reproducibility of the SPARCC scoring method for evaluating sacroiliitis in the setting of JSpA/ERA. Our results agree with available literature evaluating SPARCC’s reproducibility in the adult population and further support that the SPARCC scoring method is feasible to be used in the pediatric population23,32. A reliable and objective scoring method is of great benefit for quantification of SI joint inflammation, implementation of appropriate therapies (especially anti-TNF agents), and monitoring of disease activity in children.
Most of our patients had a WB-MRI ERA protocol done, using a dedicated oblique coronal STIR sequence of SI joints adequate to detect the presence and extent of SI joint inflammation. We did not exclude any patients based on the MRI protocol used, because all patients’ MRI scans had more than 6 slices through the SI joints (one of the criteria for inclusion). Our results showed high intra- and interreader reliability in scores when using a single noncontrast STIR sequence. It is our suggestion that dedicated oblique coronal sequences of SI joints should be acquired as part of the WB-MRI protocol used when the SPARCC scoring method is applied.
Our analysis did not detect any correlations or trends between the SPARCC and clinical scores, which included the PGA, BASDAI, BASFI, and CHAQ scores. This can be largely attributed to the uncommon occurrence of clinical SI involvement early in the disease course in patients with JSpA/ERA, which may often be asymptomatic and thus not recorded by physical examination. Additionally, the clinical scoring methods include but are not specific for SI involvement alone; both peripheral and axial disease activity contribute to the final scores. Peripheral involvement of joints and entheses are more frequent in JSpA/ERA patients and may have accounted for high clinical scores in patients with low SPARCC scores (little to no BME seen on the MRI). Moreover, there may be variability between clinicians in how the PGA is scored. There is also no current validated clinical score for assessing sacroiliitis alone. Future prospective studies that include standardization of the clinical scoring methods, as well as routine physical examination maneuvers of SI joint inflammation (palpation of SI joint, modified Schober test, FABER test, Gaenslen’s test) even in asymptomatic patients might yield a better clinico-radiological correlation and validate the SPARCC scoring system.
The longitudinal significance of the SPARCC method could not be established in this study because of the limited number of patients who had followup MRI examinations. Importantly, just under half of the MRI studies performed on patients with JSpA/ERA demonstrated radiological evidence of sacroiliitis. Further investigation with a larger cohort and longer followup is needed to confirm the prevalence of SI involvement observed in our study, as well as to assess the diagnostic sensitivity and specificity of the SPARCC method in the early phase of the disease.
Our study had several limitations. First, no gold standard is currently available against which the validity of MRI in detecting sacroiliitis can be compared. Synovial biopsies of the SI joint can provide pathological confirmation of the disease activity but are neither practical nor ethical to obtain in a pediatric population when less invasive forms to assess disease activity are available. Second, our present study did not record chronic radiographic disease features such as erosions, postinflammatory fatty changes, and joint space narrowing33. So it is likely that we missed chronic sacroiliitis. Current clinical assessments for sacroiliitis, including physical examination maneuvers as well as the BASDAI score, have poor sensitivity and specificity. Thus, many of the clinical outcome measures are limited in scope or driven primarily by these late changes, reflecting chronic damage and functional impairments; hence there is a mismatch in the measured construct, which in turn lowers the correlation of the 2 measurement methods.
Further, the comparison of the assessment of SPARCC and clinical outcome measures to detect response to treatment of patients’ joints was limited by missing clinical data in some of the clinical measures, owing to the retrospective features of the treatment-related outcome assessment. Although the responsiveness of the SPARCC and clinical outcome scores were calculated separately for nonbiologic and biologic treatments, the treatment assignment was not random and the duration and specific drug regimen could not be stratified in this small, retrospectively examined study sample. The responsiveness of patients’ joints to treatment according to clinical and MRI outcome measures in Table 3 is only comparable within each treatment group, and not between biologic versus nonbiologic treatment groups. In addition, detailed information about sports and physical activities was not assessed and documented in a standardized fashion in all patients. We could not completely rule out any association of vigorous activity and its effect on causing isolated BME. We also acknowledge that accessibility and cost of MRI may also affect its feasibility and use as a clinical tool in some centers.
Overall, our study has demonstrated high intra- and interreader reliability of the SPARCC scores in quantifying the level of severity of sacroiliitis in the setting of JSpA/ERA in the pediatric population. Longitudinal change in SPARCC scores consistently showed higher responsiveness to treatment-related change than most clinical outcome measures, in both the biologic and nonbiologic treatment groups. The application of this MRI scoring system in clinical practice is feasible. If the results of this pilot study are reproduced in larger series, this scoring system may serve as a reliable quantitative method to assess the degree of SI joint inflammation even in the subclinical stage and potentially to monitor disease activity and response to therapy.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
- Accepted for publication September 12, 2018.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.