Abstract
Objective. To evaluate the validity (accuracy) and reliability of 2 commonly used clinical methods, 1 indirect (lifts) and 1 direct (tape measure), for assessment of leg length discrepancy (LLD) in comparison to radiograph.
Methods. Twenty subjects suspected of having LLD participated in this study. Two clinical methods, 1 direct using a tape measure and 1 indirect using lifts, were standardized and carried out by 4 examiners. Difference in height of the femoral heads on standing pelvic radiograph was measured and served as the gold standard.
Results. The intraclass correlation coefficient assessing interobserver reliability was 0.737 for lifts and 0.477 for tape measure. The remainder of the analysis is based on the average of the measurements by the 4 examiners. Pearson correlation coefficients were 0.93 for the lifts and 0.75 for the tape measure method. Paired sample t tests showed difference in means of 2 mm (p = 0.051) for lifts and −5 mm (p = 0.007) for tape measure compared with radiograph. Sensitivity and specificity were 55% and 89% for lifts and 45% and 56% for tape measure, respectively, using > 5 mm as the definition for LLD. The wrong leg was identified as being shorter in 1 out of 20 subjects using lifts versus 7 out of 20 using tape measure.
Conclusion. The indirect standing method of LLD measurement using lifts had superior validity, interobserver reliability, and specificity in comparison with radiograph over the direct supine method using tape measure. Both clinical methods underestimated LLD compared with radiograph.
- LEG LENGTH DISCREPANCY
- CLINICAL METHODS
- RADIOGRAPH
Leg length discrepancy (LLD) is common. In the general population, the prevalence of LLD of ≥ 10 mm has been reported at 2% to 24%1,2,3,4,5,6,7,8. In patients with a history of low back pain, the reported prevalence of LLD has ranged from 7% to 30%1,2,4,5,6. In addition to low back pain, LLD has been implicated in the etiology of scoliosis9,10,11, osteoarthritis5,12,13, plantar fasciitis14, stress fractures15,16, aseptic loosening of hip prosthesis17, and other musculoskeletal injuries or complaints7,18,19,20,21.
A number of techniques have been developed for determining leg length difference clinically, with varying reports of accuracy and reliability20,22,23,24,25,26. All measurement techniques fall into 1 of 2 general categories: direct or indirect23. With direct methods the objective is to determine the anatomical length of each limb first and then calculate the difference between 2 sides. Indirect methods, on the other hand, aim to determine the difference without ascertaining the length of each leg individually.
Techniques also differ according to whether the subject is standing or lying down. Weight-bearing (standing) techniques may have the advantage of taking into account the effects of gravity on compressible tissues27. Non-weight-bearing (supine or prone) methods may be more reliable in ascertaining “true” (vs “functional”) LLD in the presence of lower limb angular deformities28. Sabharwal and Kumar provide an excellent in-depth systematic review of various clinical and radiological assessment tools devised for measuring LLD23.
A number of studies have examined the reliability of individual measurement techniques for assessing LLD; however, there have been only a few head-to-head studies comparing direct and indirect methods. The strongest of those studies was performed by Lampe, et al29 in a pediatric population presenting to a limb length clinic. Results of the study demonstrated that there was greater agreement between wooden lift measurements and orthoradiography than with tape measure and radiography. The limitations of the study were that it was in a pediatric population, where landmarks are typically easier to identify than in adults. Also, clinical measurements were performed by a single observer and therefore are less generalizable, and do not allow determination of interrater reliability30. Finally, the focus of the Lampe study was LLD of 2.0 cm or more, whereas the American Medical Association guidelines suggest an LLD of 0.5 cm or more as clinically significant.
The most reliable methods for measuring LLD at present involve the use of radiography or computed tomography (CT)22,23,27,30,31. However, cost and exposure to radiation preclude the routine use of radiology in all patients with suspected LLD. For this reason, accurate clinical methods are important for assessing LLD in comparison to the traditional gold standard, radiography.
The objectives of our study were to compare the validity and interobserver reliability of 2 commonly used clinical methods of LLD measurement: a direct supine method using tape measure and an indirect standing method using lifts in comparison with radiography. The 2 clinical methods for measuring LLD that we chose are the 2 most commonly used23 and validated32,33 tools clinically. The focus of our study was to validate the Lampe, et al29 findings in an adult population. We chose to focus on smaller LLD (i.e., closer to 0.5 cm), which are seen much more commonly in day-to-day rheumatology practice than the 2.0 cm that was the focus of Lampe, et al. Finally, we wanted to demonstrate generalizability and be able to determine interobserver reliability of measurements done by different observers.
MATERIALS AND METHODS
Subjects
The study included 20 subjects (10 male, 10 female) ranging in age from 23 to 85 years who were suspected of having LLD. These patients either reported symptoms associated with LLD such as back or lower extremity pain with signs of asymmetry of the pelvis, shoulders, or spine, or they had been told by a healthcare professional that they have LLD. The subjects were all recruited from the practices of 2 rheumatologists at the Pacific Arthritis Center in Vancouver, British Columbia, Canada. Subjects with lower limb angular deformities or contractures of the hips or knees were excluded. Patients with varus or valgus malalignment were excluded to eliminate bias between supine and standing methods. All participants gave informed consent to take part in the study. The protocol was approved by the Clinical Research Ethics Board of the University of British Columbia.
Examiners
Clinical measurements were conducted independently by 4 experienced examiners, consisting of a rheumatologist, an occupational therapist, a physical therapist, and a senior rheumatology fellow. All 4 examiners were familiar with the tape and lifts methods of LLD measurement and used them regularly in clinical practice. The examiners met in advance to standardize and practice the methods of direct and indirect measurement used in our study.
The standing lift method
Indirect measurement of LLD was performed by a method similar to that described by Gofton and Trueman13. Each subject was examined separately in an examination room. The subject stood barefoot on the floor (levelness of the floor was tested in advance with an aluminum spirit level). The feet (first metatarsals) were positioned 15 cm apart on 2 footmarks previously taped to the floor. The subject was instructed to stand erect with feet parallel, knees straight, and weight equally distributed over the 2 heels. The examiner sat in a chair directly behind the subject and placed his/her hands on the iliac crests. If 1 side was deemed to be lower, plastic blocks of varying thicknesses (1/16”, 1/8”, 1/4”, 1/2”, 3/4”, 1”) were placed under the shorter leg until the pelvis was felt to be level. To ensure the pelvis was level, the examiners had been instructed to use 3 anatomical reference points described by Aspegren, et al32: (1) iliac crest symmetry, (2) vertical appraisal of the spine from the sacral base (the spine should be perpendicular to the sacral base), and (3) symmetry of the posterosuperior iliac spine dimples. Once the examiner was satisfied that the pelvis was level, the size of blocks that were used was added up and recorded. Values were later converted from inches to millimeters and represented the LLD as measured by the lift (indirect) method.
The supine tape method
Direct measurement of LLD was performed using tape measures with patients in supine position. The anterior superior iliac spine (ASIS) and lateral malleoli were used as landmarks because this technique has been shown to be the most valid and reliable of 4 different methods of direct measurement tested33. The subjects were instructed to lie flat on the examination table and “bridge” the pelvis (by flexing the knees, raising the buttocks off the table, placing the buttocks back on the table, and extending the knees again). The ankles were then gently pulled to straighten out the legs. The subject was then asked to remain relatively motionless until all 4 examiners had made their assessments. Blank paper tape measures, marked “right” or “left” leg on each side, were supplied to each examiner for every subject. The top of the tape measure was placed on the inferior aspect of the right ASIS. The inferior aspect of the right lateral malleolus was then identified and marked on the tape. The tape was then turned around and the procedure repeated on the left leg. Once the tape measure had been turned around, the examiner was not able to see where the mark had been placed on the other side. This was done to prevent the examiners from being influenced by knowing the length of 1 leg when measuring the length of the other leg. All tapes were collected, and the LLD were later determined (measured from the tapes) by a research assistant who was blinded to the radiographic and indirect measurement results.
Radiography
Erect-posture (standing) anteroposterior (AP) radiographic measurements served as the standard for comparison of the 2 clinical methods. The technique used was based on the methods described by Giles and Taylor, who validated their radiographic method and found mean error of 1.12 mm (± 0.92)4. Radiographs of the pelvis were obtained with the subject standing barefoot on the floor. The floor was tested in advance with an aluminum spirit level to ensure it was level. The feet (first metatarsals) were positioned 15 cm apart on 2 footmarks previously taped to the floor. The subject was instructed to stand erect with feet parallel, knees straight, and weight equally distributed over the 2 heels. The back was firmly applied to the cassette holder to reduce rotation. The radiographic beam was centered at the level of the femoral heads, at right angles to the film. The vertical direction of the films was ascertained through the use of a plumb line. All radiological LLD measurements were made independently by a radiologist using a standard protocol for leg length measurements. The radiologist was blinded to the results of the clinical measurements.
Statistical analysis
Data analysis was performed using SPSS statistical software (version 20.0). Variables evaluated included test validity, systematic bias, interobserver reliability, sensitivity, and specificity. Validity (accuracy) of clinical methods was estimated by calculating Pearson correlation coefficient between clinical measurements (lift and tape measure) and radiography (Table 1). Systematic bias with clinical methods was assessed using paired samples t tests, looking for significant differences between mean clinical measurements and radiography (Table 1; i.e., whether the clinical methods were significantly underestimating or overestimating LLD in comparison with radiograph). Interobserver reliability (precision) was assessed by calculating intraclass correlation coefficient [ICC; random effect, single measure: ICC(2,1)]34 between the 4 examiners (Table 2). Sensitivity and specificity were calculated by setting a cutoff value of > 5 mm for absence or presence of LLD (Table 3).
RESULTS
Measurements obtained by lifts, tape measure, and radiographic techniques were recorded and analyzed for the 20 subjects (Table 4). For both clinical measurements, the average values from the 4 examiners were displayed and compared with radiography (Table 5).
Pearson correlation coefficients for clinical versus radiographic measurements were 0.93 and 0.75 for the lift and tape techniques, respectively (Table 1). Paired sample t tests showed a difference in means of −2 mm (p = 0.051) for lift and radiograph and −5 mm (p = 0.007) for tape and radiograph.
The occurrence of measurement errors > 5 mm compared to radiograph was greatest in the clinical method of tape measure (14/20 subjects). Only 2/20 subjects measured with lifts had > 5 mm “error” compared to radiograph. Errors of > 10 mm compared to radiograph were present in 5/20 versus 0/20 tape measure vs lifts, respectively. In 7/20 cases, the wrong leg was identified as being shorter by tape measure vs 1/20 by lift method, when compared with radiograph (Table 2).
Interobserver reliability was measured using the ICC and was found to be higher for lifts. Lifts had an ICC of 0.737 (95% CI 0.565–0.870) compared to tape measure, which had an ICC of 0.477 (95% CI 0.253–0.706). Sensitivity and specificity of lifts were found to be 55% and 89%, respectively, while those for tape measure were 45% and 56% (Table 3).
DISCUSSION
Our study found that the average indirect lift measurements had a closer correlation with radiograph than did the tape measure (Pearson correlation coefficient: 0.93 vs 0.75). We observed that when examined separately, each of the 4 observers performed better using lifts compared to tape measure. Interestingly, the tape measure method identified the wrong leg as being shorter in 7/20 subjects compared to 1/20 with the lift method. An error of > 5 mm was seen in 9/20 subjects measured with tape compared to 2/20 with lifts. An error of > 10 mm was present in 5/20 tape and 0/20 lift measurements.
ICC was very low for tape measure (0.477) and moderate for the lift method (0.737), indicating greater interobserver reliability with the lift method (Table 4). The combination of both a closer correlation of lift measurement with radiograph, and a greater interobserver reliability, support lift measurement over tape measure. However, a number of factors have been identified that when present can adversely influence the reliability of the tape measure method in general. These factors include difficulties in precisely locating bony landmarks through palpation4,5,35, variations in the long axes of the lower extremities (e.g., genu valgus or varus)20,35, soft tissue contractures across the hip or knee joints33, differences in the circumference of the legs33, differences at the level of ankles or feet (i.e., below the lower measurement landmarks, most commonly the medial malleoli)7,23, and presence of pelvic asymmetry confounding LLD measurement36. These are inherent limitations to the tape measure method that cannot be completely eliminated from the process and demonstrate why the tape measure method may be the less reliable of the 2 methods.
Beattie, et al reported that taking the average of 2 tape measure determinations, rather than a single determination, can improve validity and reliability37. In our study, the average of 4 measurements (by 4 examiners) was determined. The combination of the 4 measurements provides a more valid and reliable measure than previous studies that involved only 1 examiner.
Interestingly, our study found that both clinical methods underestimated the LLD in comparison with radiography [lifts: −2 mm (p = 0.051); tape measure: −5 mm (p = 0.007); Table 5]. While statistically significant, it is unclear whether this small difference is clinically significant5,38,39. Woerman and Binder-Macleod made the same observation that although the indirect method was the most reliable, it underestimated LLD compared to radiography33. This observed underestimation should be considered when using clinical methods for assessment of LLD.
Difference in height of femoral heads on standing AP radiograph of the pelvis was used as the gold standard in our study. This is based on the technique’s reliability demonstrated in experimental models13, its use in previous studies3,4,5,26,32,40,41, and high agreement with supine radiograph methods42.
There are a number of small studies that inadequately address the issue of LLD clinical assessment. Woerman and Binder-Macleod33 examined 4 direct [ASIS to medial malleolus (MM), ASIS to lateral malleolus, umbilicus to MM, and xiphosternum to MM] and 1 indirect (standing iliac crest palpation with lifts) method of clinical LLD assessment. They found the indirect method to be more accurate and reliable than any of the direct methods.
Friberg, et al compared direct (ASIS to MM) and indirect (standing) methods to standing radiographic measurements and concluded that the reliability of all clinical determinations was questionable22. Clarke compared tape measure, iliac crest palpation, and standing radiograph and reached a similar conclusion with regard to the clinical methods tested40. Conversely, Aspegren, et al reported that the standing indirect method was strongly comparable to standing radiograph when 3 anatomical points of reference were taken into account during visual correction of pelvis using lifts32.
There are some limitations to our study. First, because the standing radiograph and the clinical lift method are both indirect (weight-bearing) techniques, this may in part account for the higher correlation of lifts with radiograph in our study. However, other investigators have also found higher validity of indirect methods for measurement of LLD, using supine radiograph38.
The second limitation to our study is that to calculate sensitivity and specificity, we defined LLD as a > 5 mm difference in femoral head height on standing radiograph. The cutoff value of 5 mm is supported by the literature5,39,42. However, it was also chosen for the purpose of dividing the cohort into 2 nearly equal halves of 9/20 with and 11/20 without radiographic LLD. To our knowledge, this is the first study on clinical LLD measurements that reports on sensitivity and specificity. Further studies could expand upon our findings by testing sensitivity and specificity at multiple cutoff values.
We recognize that this is a preliminary study and that further studies will have the benefit of calculating a needed sample size based on the results of our study.
A further limitation is that all but 1 of our study participants had LLD < 15 mm. It is not clear whether our findings can be generalized to subjects with LLD > 15 mm. While mild to moderate LLD (e.g., under 20 mm) is much more common in the general population and therefore encountered more frequently7, LLD > 20 mm may be associated with greater symptoms clinically7,10.
Our findings support previous research that the standing lift technique may have greater validity and reliability than the supine tape-measure for assessment of LLD. This is a preliminary study and further studies are needed to determine the validity and reliability of clinical methods in assessing LLD > 15 mm in comparison with standing radiographs as well as with available imaging techniques such as ultrasound, CT, or magnetic resonance imaging.
- Accepted for publication April 4, 2014.
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.