Abstract
Objective. To develop the Outcome Measures in Rheumatology (OMERACT) thumb base osteoarthritis (OA) magnetic resonance imaging (MRI) scoring system (TOMS) for the assessment of inflammatory and structural abnormalities in this hand OA subset, and test its cross-sectional reliability.
Methods. Included features and their scaling were agreed upon by members of the OMERACT MRI Task Force using the Hand OA MRI scoring system as a template. A reliability exercise was performed in which 3 readers participated, using a preliminary atlas with examples to facilitate reading. Each reader independently scored a set of 20 MRI (coronal and axial T1- and T2-weighted fat-suppressed images, of which 5 included T1-weighted fat-suppressed post-Gadolinium images). Intra- and inter-reader reliability were assessed using ICC, percentage exact agreement (PEA), and percentage close agreement (PCA).
Results. The TOMS assessed the first carpometacarpal (CMC-1) and scaphotrapeziotrapezoid (STT) joints for synovitis, subchondral bone defects (including erosions, cysts, and bone attrition), osteophytes, cartilage, and bone marrow lesions on a 0–3 scale (normal to severe). Subluxation was evaluated only in the CMC-1 joint (absent/present). Reliability of scoring for both joints was comparable. Interreader ICC were good for all features (0.77–0.99 and 0.74–0.96 for CMC-1 and STT joints, respectively). Intrareader reliability analyses gave similar results. PCA was ≥ 65% for all features. PEA was low to moderate, with better performance for subchondral bone defects, subluxation, and bone marrow lesions.
Conclusion. A thumb base OA MRI scoring system has been developed. The OMERACT TOMS demonstrated good intrareader and interreader reliability. Longitudinal studies are warranted to investigate reliability of change scores and responsiveness.
Hand osteoarthritis (OA) affects the interphalangeal (IP) joints and the thumb base, including the first carpometacarpal (CMC-1) and scaphotrapeziotrapezoid (STT) joints1. Thumb base OA may consist of a separate hand OA subset, with distinct risk factors1. However, much is unknown about the pathophysiology and disease course of hand OA subsets. New imaging modalities including magnetic resonance imaging (MRI) with visualization of all affected joint compartments may lead to increased insights into this disease.
Previously, the Hand OA MRI scoring system (HOAMRIS) for IP OA was developed with good cross-sectional and moderate longitudinal reliability2,3. However, although the thumb base is commonly affected in patients with hand OA4, no MRI scoring systems assessing these joints exist to date. MRI studies of the thumb base of patients with hand OA can contribute to the understanding of this disease subset, including its differences from and similarities with IP OA.
The aim was to develop the Outcome Measures in Rheumatology (OMERACT) thumb base OA MRI scoring system (TOMS) for the assessment of inflammatory and structural abnormalities in thumb base OA, and to test its cross-sectional reliability using the OMERACT methodology5,6.
MATERIALS AND METHODS
Development of the OMERACT TOMS
Using HOAMRIS as a template, members of the OMERACT MRI Task Force iteratively discussed in several Web-based meetings the joints and features (including definitions and scaling) to be included, as well as a list of preferred sequences and planes, and agreed by consensus.
Table 1 provides an overview of the proposed MRI features. Each feature was evaluated on 0–3 scales in the CMC-1 and STT joints, except subluxation, which was scored absent/present in the CMC-1 joint only. The proximal and distal joint parts were scored separately for subchondral bone defects, osteophytes, and bone marrow lesions (BML). For CMC-1, the proximal part of the first metacarpal bone (from the articular surface to a 1-cm depth) and distal half of the trapezium were evaluated (range 0–6). For STT, the proximal half of the trapezium and trapezoid and the distal half of the scaphoid were scored (range 0–9). Increments of 0.5 were introduced for synovitis, subchondral bone defects, and BML to increase potential responsiveness of the score.
Definitions and scaling of features in the proposed Outcome Measures in Rheumatology thumb base osteoarthritis MRI scoring system.
Reliability exercise
A reliability exercise was conducted by 2 rheumatologists (VF, FG) and 1 radiologist (CP) with extensive experience in assessing hand/wrist MRI. Two readers (VF, FG) repeated the exercise after 1 month, after recoding and rearranging the MRI in a different order. A preliminary atlas with examples of most grades of each feature was developed prior to the exercise, approved by the members of the task force and distributed among readers to facilitate scoring. Each reader scored 20 MRI: 15 MRI were acquired on a 1.5T extremity MRI unit (ONI, GE) in patients with hand OA from the Hand Osteoarthritis in Secondary Care (HOSTAS) study at Leiden University Medical Center (Leiden, the Netherlands), and 5 MRI were acquired on a 3.0T MRI unit (Philips Ingenia) in patients with hand OA from Sheba Medical Center (Tel Aviv, Israel). MRI were selected by a nonreader to include a wide range of severity of pathology in the thumb base (based on Kellgren-Lawrence arthritis grading scale scores). MRI from HOSTAS included coronal and axial T1-weighted (T1w) fast spin echo (FSE), and T2w FSE images with fat-saturation (fs; Supplementary File 1 is available with the online version of this article). MRI from the Sheba Medical Center additionally included coronal and axial T1w-fs post-gadolinium (Gd) images. A general wrist acquisition was used. Data collection in both centers was approved by the local ethics committee. All HOSTAS participants signed informed consent; written consent was waived for the use of MRI from the Sheba Medical Center.
Statistical analysis
Each MRI feature was analyzed separately for the CMC-1 and STT joints. Separate scores for the distal and proximal joint parts were combined into a single sum score per joint where appropriate. Median and interquartile range were calculated for each feature based on the mean value of the 3 readers. Reliability was assessed by calculating ICC, percentage exact agreement (PEA), and percentage close agreement (PCA). Single and average measure ICC (mixed-effect models, absolute agreement) were calculated to assess intrareader and interreader reliability, respectively. ICC values ≤ 0.20 were considered poor, > 0.20 to < 0.40 fair, ≥ 0.40 to < 0.60 moderate, ≥ 0.60 to < 0.80 good, and ≥ 0.80 very good reliability7. PEA was defined as a difference of 0 between minimum and maximum scores across readers, and PCA as a difference of ≤ 1 between minimum and maximum scores.
RESULTS
Supplementary Table 1 (available with the online version of this article) shows characteristics of the 15 HOSTAS patients. Most MRI features were present in the majority of patients (Table 2). STT joint scores were overall lower compared with CMC-1, despite higher possible score range for certain features. Time required to perform TOMS was comparable to that required to score 2 joints with HOAMRIS.
Median (IQR) scores of each MRI feature and n (%) with each feature present for the CMC-1 and STT joints (n = 20). Separate scores for the distal and proximal part of the joint were combined into a single sum score per joint. The n (%) is of patients with each feature present according to at least 1 of 3 readers.
All features demonstrated good to very good interreader ICC values (Table 3). PCA was ≥ 65% for all features. PEA was low to moderate, with better performance for subchondral bone defects, subluxation, and BML. Similar results were found for intrareader reliability (Supplementary Table 2 is available with the online version of this article). Reliability of the CMC-1 and STT joints were generally comparable.
Interreader reliability of MRI features for the CMC-1 and STT joints (3 readers). Separate scores for the distal and proximal part of the joint were combined into a single sum score per joint to calculate ICC.
When analyzing the reliability of subchondral bone defects, osteophytes, and BML for the distal and proximal joint parts separately, we generally saw ICC comparable to the aggregated scores. However, for subchondral bone defects in the trapezoid and osteophytes in the trapezoid and the proximal side of the trapezium, ICC were moderate (data not shown).
Readers gave slightly higher scores when assessing synovitis on post-Gd images as compared with the T2w-fs images (data not shown), whereas reliability was comparable (CMC-1: ICC 0.75, 95% CI 0.05–0.97 vs ICC 0.83, 95% CI 0.59–0.94; and STT: ICC 0.68, 95% CI −0.37 to 0.96 vs ICC 0.78, 95% CI 0.47–0.92 for images with vs without Gd).
DISCUSSION
In our study, the OMERACT MRI Task Force proposed the first thumb base MRI scoring system, TOMS, and evaluated its cross-sectional reliability. The score was feasible and had good to very good reliability for the assessment of structural and inflammatory features in the CMC-1 and STT joints.
The previously published OMERACT HOAMRIS for the IP joints was used as a prototype in the development of the TOMS3. Two major differences between the scoring systems can be noted. First, erosive damage and cysts were combined into 1 score (subchondral bone defects) because it was judged that the distinction could not be made reliably in the thumb base joints. Second, because of larger joint size, it was reasoned that direct cartilage assessment is feasible in the thumb base when using appropriate MRI sequences, and should be prioritized over indirect cartilage assessment. Further, it was decided to score distal and proximal joint parts separately for some features, similar to the first Oslo MRI scoring system for IP OA8. Because only 2 joints are evaluated, this addition provides more detailed information without decreasing feasibility. In future studies of pharmacological and nonpharmacological interventions, the HOAMRIS and TOMS can be used as complementary scoring systems, because both assess similar features. Combined assessment with MRI of the fingers and thumb base of patients with hand OA in future trials can provide information about hypothesized differences in the pathophysiology of these OA subtypes1.
Assessment of the scaphotrapezoid articulation was also included in the scoring system. Previous cadaver studies have shown frequent degenerative changes of the scaphotrapezoid joint, although its relative contribution to STT joint OA complaints is unclear, partly because of poor visualization with traditional radiography9,10.
All included MRI were performed using a standard wrist acquisition technique. Although dedicated thumb base acquisitions do exist, these are not widely used in clinical practice. It is unclear whether the use of a dedicated thumb base acquisition would yield different results, and this should be evaluated in future studies.
Only 5 MRI included post-Gd imaging. No previous studies have compared the reliability and validity of MRI-defined synovitis with and without contrast in patients with hand OA. In knee OA, synovitis is commonly assessed without contrast, although contrast-enhanced MRI appears to be a more reliable and valid measure of synovial inflammation, with the ability to differentiate inflamed synovium from effusion11,12. Østergaard, et al found that omitting contrast from MRI examination of synovitis in the metacarpophalangeal and wrist joints in patients with rheumatoid arthritis decreased reliability13. In our sample, reliability was good using both contrast and noncontrast images. This warrants more detailed analysis, preferably comparing synovitis scores between different sequences within the same patient in a larger sample.
Before TOMS can be recommended as a core instrument according to the OMERACT filter6, assessment of the reliability of change scores and its responsiveness in longitudinal studies is needed. Future studies will reveal whether reliability of TOMS is similar when used by other trained readers compared with expert readers who developed the scoring system, which for the HOAMRIS was shown to be either better or worse14,15. Further, readers used a preliminary atlas during the exercises, which has likely increased agreement across readers, as was previously shown for the HOAMRIS3. A comprehensive atlas including all grades of all features in both joints would facilitate scoring and increase reliability of the TOMS. Validity of the scoring system should be investigated in future studies by assessing correlations with signs and symptoms, and other imaging modalities including traditional radiography and ultrasound.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
We are indebted to I. Eshed (Department of Diagnostic Imaging of the Sheba Medical Center, Tel Aviv, Israel) and the Department of Radiology of the Leiden University Medical Center (Leiden, the Netherlands) for providing the magnetic resonance images for the scoring exercise.
- Accepted for publication January 13, 2017.