Abstract
Objective. Foot osteoarthritis (OA) is very common but underinvestigated musculoskeletal condition and there is little consensus as to common magnetic resonance imaging (MRI) features. The aim of this study was to develop a preliminary foot OA MRI score (FOAMRIS) and evaluate its reliability.
Methods. This preliminary semiquantitative score included the hindfoot, midfoot, and metatarsophalangeal joints. Joints were scored for joint space narrowing (JSN; 0–3), osteophytes (0–3), joint effusion/synovitis, and bone cysts (present/absent). Erosions and bone marrow lesions (BML) were scored (0–3) and BML were evaluated adjacent to entheses and at sub-tendon sites (present/absent). Additionally, tenosynovitis (0–3) and midfoot ligament pathology (present/absent) were scored. Reliability was evaluated in 15 people with foot pain and MRI-detected OA using 3.0T MRI multi-sequence protocols, and assessed using ICC as an overall score and per anatomical site.
Results. Intrareader agreement (ICC) was generally good to excellent across the foot in joint features (JSN 0.90, osteophytes 0.90, effusion/synovitis 0.46, cysts 0.87), bone features (BML 0.83, erosion 0.66, BML entheses 0.66, BML sub-tendon 0.60) and soft tissue features (tenosynovitis 0.83, ligaments 0.77). Interreader agreement was lower for joint features (JSN 0.43, osteophytes 0.27, effusion/synovitis 0.02, cysts 0.48), bone features (BML 0.68, erosion 0.00, BML entheses 0.34, BML sub-tendon 0.13), and soft tissue features (tenosynovitis 0.35, ligaments 0.33).
Conclusion. This preliminary FOAMRIS demonstrated good intrareader reliability and fair interreader reliability when assessing the total feature scores. Further development is required in cohorts with a range of pathologies and to assess the psychometric measurement properties.
Osteoarthritis (OA) of the foot is a common cause of pain and disability1,2,3. Radiographic studies suggest OA is much more common in the foot than previously suspected3,4,5. The prevalence was reported to be between 60.7% and 94.6% for the foot joints in those aged 62–94 years2. Magnetic resonance imaging (MRI) has been used in describing and defining knee OA pathology; however, its use in foot OA is limited, possibly because of the complexity of foot anatomy and image acquisition. Further, while semiquantitative scores have been developed for the knee, hip, and hand6,7,8,9,10,11,12,13, none exist for OA of the foot. The aim of our study was to develop a foot OA MRI score (FOAMRIS) for assessing pathological features of OA and soft tissue features that may be commonly associated with foot pain.
MATERIALS AND METHODS
Development of the FOAMRIS
Following a review of MRI scoring systems8,13,14,15, a consensus process was undertaken involving 2 musculoskeletal radiologists, 2 rheumatologists, and 3 podiatrists. A preliminary scoring system was developed to identify and grade typical pathological features of OA in the joints, bones, and soft tissue features associated with foot pain.
The new system included 16 joints: first to fifth metatarsophalangeal (MTP) joints and tarsometatarsal joints, navicular-medial-cuneiform, navicular-intermediate-cuneiform, navicular-lateral-cuneiform, talonavicular, calcaneal-cuboid, and subtalar. Twelve bones were included: first to fifth metatarsals (divided into the distal, central and proximal regions), lateral cuneiform, intermediate cuneiform, medial cuneiform, navicular, cuboid, calcaneus, and talus. The interphalangeal joints and toes were not included in this assessment score because these are often not in the field of view in a foot and ankle MRI coil.
Tendons and ligaments of the foot were included, in 8 sites of tenosynovitis: tibialis anterior, extensor hallucis longus, extensor digitorum longus, peroneus brevis, peroneus longus, tibialis posterior, flexor hallucis longus, and flexor digitorum longus. The Lisfranc ligament complex and intertarsal ligaments were included, although not every ligament in the Lisfranc (midfoot) region was individually scored because of the large degree of anatomical variation16. These sites were included because of the association of soft tissue disorders in OA17, which has been shown for Lisfranc injuries and tendon damage18,19.
Five sub-tendon sites of the foot (bone regions adjacent to overlying tendons) were also included: lateral calcaneus under long peroneal tendon, lateral cuboid under long peroneal tendon, medial calcaneus under posterior tibial tendon, medial navicular under posterior tibial tendon, and medial cuneiform under anterior tibial tendon. These sub-tendon sites, where tendons wrap around the bones, have been described as “functional entheses” and are sites associated with pain in mechanical foot disorders20,21,22,23. On MRI, these regions can be associated with abnormal signal in the tendon and at the adjacent bone of the ankle23, and it is unclear whether this may be the case in the foot.
Enthesopathy has been shown to be somewhat associated with OA in the hands13,24,25,26. It is as yet unclear whether there may be an association in the foot, given the weight-bearing design of the structures; therefore enthesopathy was scored at 9 sites in the foot: the tibialis anterior tendon at the plantar distal medial base of the first metatarsal bone and plantar distal medial cuneiform bone; peroneus longus tendon at the plantar base of the first metatarsal bone and plantar distal base medial cuneiform bone; tibialis posterior tendon at the plantar insertion at the base of the second, third, or fourth metatarsal bones, the plantar proximal medial cuneiform bone, the plantar medial of lateral cuneiform bone, and plantar medial navicular bone; and finally the peroneus brevis tendon at the dorsal lateral base of the fifth metatarsal.
A set of MRI features was determined, and semiquantitative scores for each feature were then developed. The term “bone marrow lesion” (BML) was adopted in this system, rather than bone marrow edema, because bone signal in OA may not be attributed solely to fluid27. During the consensus process, it became apparent that the development of a cartilage score posed challenges because of the small cross-sectional surfaces and complexity of the anatomy. Therefore, a pragmatic approach was taken and a joint space narrowing (JSN) definition was agreed. To provide a score that could be applied in the absence of contrast agent, we did not include multiple severity categories for scoring or differentiate between synovitis and effusion [previously adopted in rheumatoid arthritis (RA) of the foot and OA of the hand12,28], but pragmatically scored for the presence or absence of joint effusion/synovitis. The final definitions of each MRI feature, anatomical locations, and semiquantitative scores are summarized in Table 1.
Image acquisition
Fifteen participants were recruited as part of a larger study. In accordance with the Declaration of Helsinki, ethical approval was provided (Leeds West Ethics Committee 09/H1305/10). Participants were included if they reported foot pain on weight-bearing and the musculoskeletal radiologist judged there to be MRI features of OA, which were based on knee MRI and foot radiographic criteria in at least 1 foot joint4,29. Inclusion was based, therefore, on the presence of osteophytes judged to be at least moderate in size (≥ grade 2) or, where the osteophytes were graded “small,” this was accompanied by JSN (partial to full thickness, grade ≥ 2) and subchondral BML with cysts.
Participants were scanned using a Siemens Magnetom Verio (3T) large-bore MRI scanner (Siemens Medical Solutions). All scans were acquired using an 8-channel foot and ankle coil, with the foot placed perpendicular to the ankle and magnetic field (β0) and centered over the navicular bone. The following protocol was used: T2-weighted fat-saturated sequence variables were TR: 3000–3600 ms, TE: 69, flip angle: 155–160°, echo train length 8, 2-mm slices, and 0.4-mm inter-slice gap, matrix 256 × 256, and field of view (FOV) 150 × 150 mm in 3 planes. Short-tau inversion recovery sequence (STIR) variables were TR: 4500 ms, TE: 31, NEX 2, TI 200, flip angle 150°, echo train length 11, 3-mm slices and 0.6-mm inter-slice gap, matrix 320 × 256, and FOV 150 × 150 mm in 3 planes. T1-weighted high-resolution spin echo sequence variables were TR: 700 ms, TE: 10, FS 3, flip angle: 90º, 1.2-mm slices and 1.32-mm inter-slice gap, matrix 512 × 512, and FOV 150 × 150 mm in the sagittal plane. Gradient recalled echo sequence variables were TR: 450, TE: 2.5, flip angle 30°, echo train length 1, 3-mm slices, 0.6 mm interslice gap, Matrix 336 × 448, and FOV 250 × 250 mm in the sagittal plane.
FOAMRIS reliability
Anonymized scans were analyzed using OsiriX 64-bit Version 5.6 (OsiriX Foundation). All images were scored using the standardized score sheet (Supplementary Data 1, available with the online version of this article) and the FOAMRIS system (Table 1, Figure 1, Figure 2, and Figure 3). Intrareader reliability was undertaken by an experienced musculoskeletal radiologist who read the same images twice in a random order more than 1 week apart. An interreader reliability exercise was undertaken by a second reader. Both readers undertook a consensus exercise together using 5 separate foot images prior to second reader scoring.
Features were scored for each joint, bone, and soft tissue site, with all sites grouped. Reliability scores were evaluated using descriptive statistics; percentage of exact agreement (PEA) and Chamberlain percent positive agreement (PPA), which is the proportion of the total number of ratings made in a given category during the 2 readings (either intra- or interreader pairs) that were in agreement. Additionally, ICC were calculated using generalizability theory; the Brennan method was used to account for negative variance components30. The individual joint or bone was considered the facet of differentiation. Joint or bones were considered to be nested within patients. Patient, occasion (for intrareader reliability), and reader (for interreader reliability) were considered random facets of generalization. Occasionally a negative ICC was obtained; when this occurred, we reported that the result was negative (indicating poor agreement), but did not report the actual value. ICC could not be calculated when all joints or bones scored 0.
The reliability results were evaluated according to the Cicchetti criteria as < 0.40 poor, 0.40–0.59 fair, 0.60–0.74 good, and 0.75–1.00 excellent31. Analysis was undertaken using Stata 13.1 (StataCorp) and G_STRING IV (a wrapper for urGENOVA, University of Iowa).
RESULTS
The musculoskeletal radiologist read 61 sequential MRI, of which 35 were classified as having foot OA and deemed eligible for the study. Fifteen participants’ scans were chosen at random for the reliability study. The participants were aged between 41 and 66 years [median 51 yrs, interquartile range (IQR) 46–60], included 10 women, and had a median body mass index (BMI) of 31.5 (IQR 26.3–34.5, range 23.5–40.1). OA was present in a single talonavicular joint in 5 participants, in 1–2 joints in the tarsi in 6 participants, and in 2 joints (MTP and tarsal joints) in 5 participants. An experienced radiologist performed the full FOAMRIS in 30 min per foot and reported the presence of the following conditions: JSN in 12 participants (total 31 sites), osteophytes in all participants (total 77 sites), effusion/synovitis in all participants (total 182 sites), cysts in 13 participants (total 28 sites), BML in all participants (total 74 sites), erosion in 5 participants (total 10 sites), enthesopathy in 7 participants (total 9 sites), tenosynovitis in the entire group (total 47 sites), and ligament abnormalities in 6 participants (total 7 sites).
The intrareader reliability was summarized per imaging pathology (amalgamating anatomical locations; Table 2 and Table 3), and the range across the anatomical locations (Supplementary Tables 1–7, available with the online version of this article). It should be noted that ICC represent a ratio of between-object variability to total variability and can therefore be low if there is little variation in scores between different joints/bones, which was an issue when assessing agreement in specific sites.
Combining all joints, the results showed excellent agreement for the presence of JSN (ICC total = 0.90, range across joints = 0.65–1) and osteophytes (ICC total = 0.90, range across joints = 0.00–1), although there was a low proportion of severe scores in this sample and for some individual sites, the ICC was poor. There were very few JSN grade 3 scores and no scores for osteophytes grade 3 (the majority were grades 1–2), therefore the reliability in this category remains to be determined; however, for grades 0–2 the category-specific agreement was generally substantial (range 60%–100%). The presence of effusion/synovitis was the least reliably scored (ICC total 0.46, range across joints = negative to 1). Lower reliability in scoring effusion/synovitis was due to poor agreement over the absence of effusion/synovitis at the MTP joints. The repeatability for the scoring of presence of cysts was excellent when all joints were combined, although ICC was low for some individual sites (ICC total = 0.87, range across joints = 0.00–1).
The intrareader reliability for combined sites was excellent for BML (ICC total = 0.83, range across bones = 0.49–1) and erosions (ICC total = 0.66, range across bones = 0.00–1). As was observed for the joints, in the bony features there was a relatively low prevalence of more severe scores. Scores for severity of BML suggest similar repeatability for the range of scores 1 to 3, although only 3 bones across the sample scored a grade 3. While the agreement results for erosions were not equal across the severity scale, the results showed a lower level of agreement for a score of 1; however, only 20 erosion scores were assigned grade 1, 2 erosion scores assigned grade 2, and none were assigned grade 3. The reliability in this category still remains to be determined.
The intrareader reliability of bone-related and soft tissue result, and the patterns of BML associated with tendon enthesopathy (ICC total = 0.66, range across the locations 0.44–1) and at the sub-tendon BML regions (ICC total = 0.60, range across the locations 0.00–1) were similar, with excellent agreement scores when all sites were combined. Reliability of scores for tenosynovitis was also excellent (ICC total = 0.83, range = 0.43–1). The repeatability of scoring tenosynovitis was stable across scores ranging from 0 to 2. Score category 3 was not assigned during either of the repeated reads in our study; therefore, the repeatability in this category remains to be determined. The agreement scores for all ligament abnormality were excellent (ICC total = 0.77, range across the 2 sites = 0.65–0.74), with greater scores for the Lisfranc ligament.
The interreader reliability scores are summarized in Supplementary Tables 7–12 (available with the online version of this article), and as might be expected, the intrareader scoring showed greater reliability than interreader. The results demonstrated good agreement for the presence of JSN (ICC total = 0.43, range across joints = negative to 1) and poor agreement for osteophytes (ICC total = 0.27, range across joints = 0.00–1). The interreader reliability scores for the presence of effusion/synovitis were poor across the joints of the foot (ICC total = 0.02, range across joints = negative to 0.13). The repeatability for the scoring of presence of cysts was fair (ICC total = 0.48, range across joints = negative to 1).
The interreader reliability was excellent for sites of BML (ICC total = 0.68, range across bones = 0.00–1), but was poor for erosion scores in bones with erosions present (ICC total = 0.00, values for all bones 0.00 where calculable). There were several sites for which both scorers agreed on the absence of any erosions, but ICC could not be calculated if there were no scores above 0. The interreader reliability of bone-related and soft tissue scores of BML associated with tendon enthesopathy was poor (ICC total = 0.34, range across the locations 0.00–1), but scores were less reliable at the sub-tendon BML regions (ICC total = 0.13, range across the locations negative to 0.65). Interreader reliability scores for tenosynovitis were poor (ICC total = 0.35, range = 0.00–0.61). The interreader reliability scores for all ligament abnormality were also poor (ICC total = 0.33, range across the 2 sites = 0.00–0.18), with higher scores for the Lisfranc ligaments.
DISCUSSION
To our knowledge, there are no MRI scoring systems for OA foot pathology, although a previous study has defined some MRI features in foot OA32. This new scoring system was deliberately inclusive of not only “traditional” OA features, but also included features that may inform studies investigating the broader construct of foot pain.
In our study, intrareader reliability of the total MRI features was shown to be generally excellent when assessed at a whole foot level, while interreader reliability was more variable. The best intra- and interreader reliability was seen for joint-specific features (JSN, osteophytes, and cysts), and compared well to scores such as those evaluating small joints in hand OA12. The presence of joint effusion/synovitis showed worse intra- and interreader reliability and was lower because of poor agreement, particularly at the MTP joints, which may or not be considered a normal finding. The reliability scores may have been affected by the size of the joint because joint effusion/synovitis scores have been shown previously to be more variable in small joints of the hands12. In a later reliability study of joint effusion/synovitis in the hand, the agreement improved once an atlas was developed13. In addition, administration of a contrast agent may have aided precision in estimating the volume of joint fluid, particularly in differentiating fluid from synovial hypertrophy. Further studies with contrast administration may be needed to refine the scoring and better characterize OA-related pathology.
Bony features demonstrated excellent intrareader agreement across the foot as a whole. Descriptively, the erosion scores were highly reliable across nearly all sites; however, this may have been influenced by the low number of lesions present. ICC values (where calculable) were variable, which may reflect both limitations in agreement over the presence of erosion and limitations in the amount of “true” variation between bones. The BML scores were also variable, and lower agreement was shown in the cuboid and the proximal metatarsals. Where patterns of BML were associated with the tendon enthesis, intra- and interreader reliability was good, but at the sub-tendon region, reliability was lower. No reliability studies of these MRI features have been previously reported, and there is likely to be difficulty in scoring these regions where planar anatomy is subject to partial volume artifact; in these regions, an atlas would be beneficial.
The intra- and interreader reliability of scoring of soft tissue features was similar across ligament abnormalities and tenosynovitis. Similar levels of agreement have been reported for scores of hand tenosynovitis in RA33 and hand OA12. A limited number of ligaments of the midfoot were included in this score, which have been well described16,34. Other foot ligaments were not included because of potential issues with poor visualization and requirement for specialist views and sequences, e.g., calcaneocuboid and calcaneonavicular ligaments35.
The results of our study should be considered in light of the following limitations. The sample for our preliminary study included a group with relatively mild structural OA, and more severe damage was limited to few joint regions. In addition, the definition of OA on MRI as applied in our study, while based on consensus approaches developed for other joints, requires further work and validation and this has implications for the results presented. Definitions of the individual features are difficult because of the variation in presentation in the various anatomical sites and the technical aspects of acquiring MRI. For example, we did not use contrast-enhanced imaging in our study and so have not differentiated between synovitis and effusion. A detailed definition of osteophyte grading was not provided in this score, given the widely varying presentation of periarticular bone change in sites such as the first MTP joint versus the small joints of the hindfoot or midfoot. Future work is required to refine the FOAMRIS approach and analyze validity in larger and more diverse samples. In addition, 8 participants were obese (≥ 30 BMI), which may influence the frequency of the tendon and ligament pathology because greater occurrence has been shown in obese people at the ankle36.
The foot poses unique challenges when using MRI because of the complexity of the anatomy and inherent variability in the shape and size. This manifests as problems with coil positioning, homogeneous fat saturation, imaging wrap, and magic-angle effect37. In our study, a foot and ankle coil was used, which was beneficial for maintaining a consistent position within the magnet; however, this can be limited with larger and longer feet. Using a larger coil may allow for imaging of the entire foot, although the positioning might be difficult because of flexibility and foot type. In future studies, it may be appropriate to reposition the target for the hindfoot, midfoot, and forefoot, although this will increase acquisition times and may not be desirable.
The issue of how many planes and sequences to acquire is a complex one. In our study, both T2-weighted water-sensitive and STIR sequences in 3 planes were included to account for possible failure of the fat saturation. T1-weighted sequences included high-resolution spin echo and gradient recalled echo in a single plane, which may have affected the scoring of erosions and osteophytosis. In practice, where acquisition time is of primary importance, a T2 fat-saturated sequence may suffice.
A minimum of 2 planes for each T1-weighted sequence could improve scoring; however, defining the optimum plane for each foot joint requires further work and a 3-D sequence may provide a compromise. Gradient recalled echo sequences are sensitive in delineating subchondral cysts and were helpful in the verification in our study. These sequences, however, are insensitive to diffuse marrow abnormalities because of trabecular magnetic susceptibility and will not show the full extent of these lesions, so in our study, spin echo sequences were also used for better BML detection8. Further consensus regarding sequence choice is recommended.
Across most scores, interreader reliability scores were lower than intrareader. We have identified training (the second reader was less experienced) and case definitions as likely contributors to these findings. Improved description of certain scoring features, accompanied by an atlas, would be a natural next step because this process has improved interreader reliability in other MRI scores13.
Finally, it is recognized that ICC can be affected by the degree of “true” variability in the sample, which in this relatively mild group was limited for some features. Further validation in more diverse samples should give a more accurate assessment of inter- and intrareader reliability.
We have proposed a set of definitions and scoring criteria for a semiquantitative MRI investigation of multiple foot pathology: FOAMRIS. This preliminary scoring system generally showed acceptable reliability for a broad range of pathologies except for effusion/synovitis, and for some features at anatomical sites where visualization may be particularly influenced by acquisition plane. Iterative development is now needed, and will include application in other cohorts, expert consensus on acquisition protocol, use of contrast, and the development of an atlas to aid scoring.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
We acknowledge the contribution of Dr. Eiji Fukuba, musculoskeletal radiologist, who was involved in the consensus exercise in the development of the magnetic resonance imaging (MRI) scores. We also acknowledge the expertise of Dr. Richard Hodgson, Rob Evans, and Dr. Carole Burnett of the NIHR Leeds Musculoskeletal Biomedical Research Unit in the acquisition of the MRI scans used in the project.
Footnotes
Authors A.C. Redmond, P.G. Conaghan, A.M. Keenan, and E.M. Hensor are funded in part by the NIHR Leeds Biomedical Research Centre. The work was directly supported by an Arthritis Research UK grant (no. 18256) and the Leeds Experimental Osteoarthritis Treatment Centre, supported by Arthritis Research UK grant (no. 20083) and the Arthritis Research UK Sports, Exercise and Osteoarthritis Centre grant (no. 20194).
- Accepted for publication April 11, 2017.