Abstract
Objective. Conventional measures of spinal mobility used in the assessment of patients with axial spondyloarthritis (axSpA), such as the Bath Ankylosing Spondylitis Metrology Index and its components, are subject to interobserver variability. The University of Córdoba Ankylosing Spondylitis Metrology Index (UCOASMI) is a validated composite index based on a motion video-capture system, UCOTrack. Our objective was to assess its reproducibility in clinical practice settings.
Methods. We carried out an observational study of repeated measures in 3 centers. Video-capture systems were installed and adapted to clinical rooms. Patients with axSpA and stable disease were selected by consecutive stratified sampling [disease duration, sex, and the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI)]. Intraobserver reliability of the UCOASMI and of conventional measures was tested 3–5 days apart. For interobserver reliability, 3 patients from each center were evaluated in the other centers, within 3–7 days. The intraclass correlation coefficients (ICC) were calculated.
Results. Thirty patients were included (73% men, mean age 53 yrs, mean BASDAI 3.0). Interobserver and intraobserver ICC of the UCOASMI was 0.98. Conventional measurements showed lower but adequate reproducibility as well, except for interobserver reliability of lateral flexion (0.41), cervical rotation (0.61), and Schöber test (0.07), and intraobserver reliability of tragus-to-wall distance (0.30).
Conclusion. Reproducibility of the UCOASMI seems very high, and apparently more reliable than conventional measures of mobility.
Axial spondyloarthritis (axSpA) is characterized by structural damage1, dependent on the inflammatory process, and manifested as bone erosions, resorption, and new bone formation2,3. The result is a significant reduction of joint mobility and pain, which in turn lead to disability and deterioration of quality of life4,5,6. The objective of therapy in axSpA is 2-fold: halting the inflammatory process at present, and avoiding structural damage and disability in the future while improving patients’ quality of life. The field of measurement in axSpA has evolved in parallel to the development of new drugs, and measures of disease activity and structural damage are reasonably well accepted for use in clinical trials and in clinical practice. The measurement of damaged mobility, on the contrary, has problems. Spinal mobility can be assessed by 2 types of instruments: (1) standardized measures of range of motion and distance, in general called metrology7, and (2) questionnaires that collect the patient’s opinion about health and mobility when performing daily activities8. However, unlike the instruments used to evaluate disease activity, those designed to evaluate changes in physical function, capacity, and mobility lack objectivity; they also show high intra- and interobserver variability, and low responsiveness7,9,10.
Various technical advances have facilitated the measurement of human mobility with great precision, such as motion video-capture systems11. The University of Córdoba (UCO) developed an innovative 3-D, image-based motion capture system called UCOTrack that has been integrated in the Rheumatology Service of the hospital. This system consists of reflective markers placed in anatomical places of the subject, 4 cameras, and a motion capture system, which interprets the information coming from the videos and generates a large number of mathematical outputs that a properly trained analyst transforms into the measurements necessary to assess mobility. An index based on the measurements provided by UCOTrack, the University of Córdoba Ankylosing Spondylitis Metrology Index (UCOASMI), valid for use in patients with axSpA, has been developed12. The UCOASMI is obtained from 5 summary measures provided by the system and has been compared with the Bath Ankylosing Spondylitis Metrology Index (BASMI), the reference measure. The validity of the UCOASMI was tested in a cross-sectional study (n = 40)12. The UCOASMI showed a correlation of r = 0.88 with BASMI and a correlation of r = 0.78 with a standard measure of structural damage, the modified Stoke Ankylosing Spondylitis Spine Score (mSASSS)13; in the same study, BASMI showed a correlation of r = 0.62 with mSASSS12. In addition, the UCOASMI showed an area under the curve of 0.74 to discriminate worse versus best Bath Ankylosing Spondylitis Functional Index (BASFI) scores. In addition to an adequate construct validity, the UCOASMI shows high reliability. In a longitudinal study after 2 weeks (n = 40), the intraclass correlation coefficient (ICC) was 0.996 (close to that of BASMI, 0.956) and a very low variation coefficient of 2.80% (compared to 13.71% of BASMI). The index also showed sensitivity to change in a clinical trial of anti-tumor necrosis factor (anti-TNF) for 24 weeks (n = 15), with a Cohen d of 0.48 (compared to 0.23 of BASMI)12.
Despite having shown greater reliability and sensitivity to change than BASMI, the UCOASMI has been used only in the Reina Sofía University Hospital, Córdoba (UCO), and all the studies carried out so far have been carried out in the context of the UCO laboratory. A limited number of hospitals are beginning to implement similar systems. Bearing in mind the limitations of BASMI7,8,9,10, the increased availability of these systems would contribute to a better evaluation of patients in the clinical setting. Specifically, a measurement of mobility with such good validity and reliability would add quality to clinical research, especially in the case of studies that have spinal mobility as an endpoint. With this in mind, the objective of the present study was to evaluate the reproducibility of the UCOTrack system and the UCOASMI in the first 3 centers where this system has been installed. As a secondary objective, we aimed to evaluate the compared reproducibility of commonly used measurements in the monitoring of patients with axSpA.
MATERIALS AND METHODS
An observational study of repeated measurements was carried out in which patients were evaluated by different observers in different centers. The protocol and materials, including informed consents, were approved by the Ethics Review Boards of the centers [Hospital Fundación Alcorcón (N. 16/45), Hospital Reina Sofía (Ref. 3160), and Hospital Puerta de Hierro (Code 142/16)].
Patients
The target population was patients diagnosed with axSpA — the population in which the mobility measurement instrument is intended to be used.
Patients were selected from the 3 participating centers (Hospital Puerta de Hierro and Hospital Fundación Alcorcón in Madrid, and Hospital Reina Sofía in Córdoba) through a non-probabilistic stratified sampling. Stratification aim was to include 50% of patients with ankylosing spondylitis, 20% women, and 30% of patients diagnosed within the last 2 years (30%). All patients signed an informed consent prior to their participation in the study.
The inclusion criteria included (1) minimum age of 18 years; (2) a diagnosis of axSpA, at any stage, as recorded in the medical record; and (3) clinical stability in the opinion of the treating rheumatologist, without treatment modifications in the last 3 months. There were no limitations on the type of treatment used. Patients were excluded if (1) they had important disease activity, defined as a Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) > 6; (2) there was a need to modify the treatment during the course of the study, according to the treating physician; and (3) worsening of the clinical condition (in the opinion of the investigator) occurred during the study period. The disease activity limit was necessary because patients had to be able to travel to hospitals in other cities for the interobserver analysis.
Instruments and metric properties tested
We determined the reproducibility (inter- and intraobserver reliability) of the UCOASMI and of BASMI, BASDAI, BASFI, and Schöber.
The UCOASMI is a composite index that generates a cervical and vertebral mobility score from serial kinematic determinations12,14. It is obtained from a selection of individual measures, based on their metric properties, and is calculated as a weighted average. The score ranges from 0 to 10 (from better to worse mobility). The motion video-capture system consists of 11 reflective markers placed in anatomical places, 4 cameras, and specific software, the UCOTrack. Markers are attached in less than 2 min. The patient must then perform specific movements, such as flexion, extension, and rotation. The software interprets the images and generates summary measures that are included in the index: cervical frontal flexion, cervical rotation, frontal spinal flexion, shoulder-hip lateral angle, trunk rotation (see Supplementary Data for a description of marker placement and calculations, available from the authors on request).
BASMI8 and its versions15,16 consists of several measures that generate a score of 0 to 10. For this study, it was determined by a qualified observer, whether rheumatologist, physiotherapist, or nurse in each center. In addition, the following measurements were carried out and noted separately: lateral flexion, tragus-to-wall distance, cervical rotation, intermalleolar distance, and Schöber test9, all of which are of widespread use among rheumatologists.
BASDAI is a self-report questionnaire of results reported by patients on pain, mobility, function, fatigue, and stiffness, with scores ranging from 0 to 1017,18. BASFI measures activities of daily life that can be performed by the patient19,20.
In addition, the following demographic and disease-related variables were collected: age, sex, disease duration, overall disease assessment by the physician and the patient collected using visual analog scales (VAS) of 0–10 (0 = good, 10 = very poor), and pain assessed by the patient using a 0–10 VAS (0 = no pain, 10 = worst pain imaginable).
Study procedures
The motion video-capture system was installed in 2 clinical offices in Madrid centers (the Córdoba laboratory had already been installed when the system was developed and validated). Rooms had different height and area compared to the original laboratory set in Córdoba. The systems were installed in spaces where patients were attended regularly, not in special rooms. A caliper was used to standardize the video capture. The caliper is a structure of crossing bars with reflecting balls at the ends. At each session, the caliper is placed in the middle of the room and distances between reflectors are measured with the system. The computerized system has a specific module that corrects measures if any deviated from the real one.
Prior to patient recruitment, all observers were trained to standardize procedures; all measures, conventional and automatic, were repeated by all observers on the same model (JLG) until they understood and performed the correct method.
To determine intraobserver reliability, observers, i.e., the rheumatologist or technician appointed at each center, performed all physical measures on 10 patients (e.g., patients A1 to A10 in Center A, etc.) and repeated them after 3–5 days. In the first visit, sociodemographic and other clinical descriptive variables were collected. BASDAI and BASFI questionnaires were filled in at all visits. To determine interobserver reliability, a total of 9 patients (3 from each center, i.e., A1 to A3, B1 to B3, and C1 to C3) were evaluated in each of the 3 participating centers (Figure 1). These rotating patients were selected based on their availability. Therefore, the total number of visits/measurements at each center was 26 [20 to evaluate intraobserver reliability (10 patients × 2 determinations) + 6 to assess interobserver reliability (6 patients from other centers × 1 determination)].
To avoid the effect of environmental and internal factors as a source of variation, all measures corresponding to the same patient were performed at the same time of day, preferably in the afternoon, to avoid morning stiffness, among other things.
Statistical analysis
The sample was described using nonparametric statistics (medians, intervals, and frequencies). The intra- and interobserver reliability of the UCOASMI score were assessed by ICC (consistency model). These coefficients were obtained from ANOVA, in which the measures (UCOASMI, BASMI, etc.) were dependent variables and the patient was the classification variable. We did not calculate correlations by pairs of observers.
The results of the reproducibility of the UCOASMI were compared with those corresponding to the metrology (BASMI and components) and BASDAI and BASFI. Because it is not possible to directly compare the results by means of a statistical test, all variables were estimated with 95% CI.
In addition, we calculated the minimal detectable change (MDC) for conventional metrology and UCOASMI by means of the standard error of measurement according to the formula:
When calculating the sample size of reliability studies, the number of patients and the number of determinations (or observers) per patient (especially interobserver reliability) should be taken into account. The existence of a good variation coefficient was assumed, because it had been reported in previous studies12. To obtain an ICC of 0.80 in an interobserver reliability analysis, with 0.2 as the CI amplitude and 3 observers, i.e., 3 repeated measurements, 9 patients were needed21.
RESULTS
The sample consisted of 30 patients, mainly men with a mean age of 53 and long disease duration (Table 1). During the baseline visit, 2 patients showed a BASDAI of 6.16; their inclusion was approved for the intraobserver study.
The overall assessment of disease was moderate but variable, with most indices showing a wide range (Table 1). The mean value of the UCOASMI in the first visit was 5.2 (SD 1.7), with a minimum value of 2.9 and a maximum of 8.8. The mean values for the conventional metrology indices are shown in Table 2.
Table 3 presents the results of the interobserver reliability of conventional metrology and the UCOASMI, the total score, and the individual components acquired with the UCOTrack system. The reproducibility of the UCOASMI was very high, with an interobserver ICC of 0.98. Tragus-to-wall distance and intermalleolar distance also showed high ICC. The Schöber test showed the lowest reliability of all measures, although this was greater if measured with the UCOTrack system; a rater was performing the measure incorrectly despite training during the standardization session.
Table 4 presents the results of the intraobserver reliability of conventional metrology and the UCOASMI, by observer (center). The test-retest reproducibility of the UCOASMI again was very high, with intraobserver ICC of 0.96, 0.99 and 0.97, higher than most conventional measurements, which showed variable intraobserver ICC.
The MDC of BASMI was 0.82 points, and the MDC of the UCOASMI was 0.74 points. For other conventional metrology, the MDC were lateral flexion 3.01 cm, tragus-to-wall distance 3.62 cm, cervical rotation 8.57º, intermalleolar distance 10.17 cm, and Schöber 1.58 cm.
DISCUSSION
To date, the extent to which the UCOASMI, and thus the UCOTrack system, was reproducible and exportable to other situations and in the hands of other professionals was unknown, something essential to ensure the validity of the system in clinical practice and clinical research.
Our hypothesis that the system had sufficient reproducibility was confirmed. The UCOASMI showed acceptable reliability when repeating the test, regardless of who was applying the motion video-capture system (technical observer) and where it was applied (laboratory of motion analysis — hospitals). In addition, the hypothesis that the UCOASMI had better reproducibility than the BASMI and its individual components could not be rejected.
Regarding the measurements of the UCOASMI, a very high reliability is patent both for the repeated measurements of each observer in the 2 sessions and for interobserver reliability. Comparing it with the values obtained for BASMI in its conventional way, automated measurements of mobility showed a high metrological stability. However, it is possible that the fact that the system is reproducible in 3 different contexts is not totally generalizable to any new context in which the system will be implemented. It is thus important to start using it in other settings.
As with any measurement that implies a body to be measured, an observer, and an instrument, it is important to acknowledge the different sources of variability. In our present study, evaluators, days, and even hours could hinder the study of reproducibility, which was overcome by adjusting logistics to a certain extent. As with imaging, the reliability of the measurements can be improved by intensive training of the evaluators or by standardization of the procedure. In our present study, observers were instructed to use the same text to instruct patients, and to do so with similar enthusiasm in all patients. However, they were instructed as much in the motion analysis as in the conventional measures, and yet the reliability remained higher for the automated measures than for the conventional. As reported, one of the observers was measuring the Schöber wrong despite proper training and an instruction manual explaining the procedure. In our experience with other observational studies, the Schöber test has as many variants as centers, and in a situation with time constraints (this exercise was demanding, with many patients and measures in a short time frame), inertia may force the learned procedure over the correct one. It may be, in turn, easier to learn a new measure than to change one learned wrongly.
This study has some limitations: variability still exists, despite automation. Although UCOTrack diminishes many observer-related factors, other intervening factors, such as height, center, hour of the day, pain, BASDAI, or temperature may influence the measure. These factors cannot be disregarded by the system; therefore, training is still an important step in standardization. Interestingly, the dimensions of the offices where UCOTrack was installed were very different, and yet no variation was seen when the caliper was tested.
We should bear in mind the longterm consequences of reduced mobility and the importance of both its periodic measurement and targeting when choosing therapy. Our group has shown a correlation between mobility and radiographic progression10. Although it needs demonstration, targeting mobility may have an effect on delaying structural damage. To date, mobility has not been a major outcome in clinical trials, especially in anti-TNF trials, probably because of the difficulty in standardization and limited reliability of measurement9,22. Neither have we seen widespread use of BASMI in most published articles, whether clinical trials or observational studies, a situation that could be interpreted as a hidden devaluation of the measurement.
The variability of the terminology used in the validation of the measurements can give rise to errors or misconceptions. We have attempted to use the terminology of COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) throughout the methodology and report, as proposed by the COSMIN group23 and de Vet, et al21. The UCOASMI reflects vertebral mobility better than BASMI8,12. BASMI includes the intermalleolar distance, a measure that depends solely on hip mobility. Consequently, 20% of the mobility information generated by BASMI is not related to vertebral mobility. In addition, BASMI includes the tragus-to-wall distance, which is constant in longterm disease, while the UCOASMI evaluates the neck and lumbar regions in all 3 planes, which are all anatomical areas very sensitive to movement restrictions by inflammation. Further, the UCOASMI includes rotation, a movement clearly impaired in patients with active disease and damage, and which is very difficult to measure with a tape or a goniometer.
The reproducibility of the UCOASMI across the 3 centers was very high, in contrast to a slightly lower reproducibility of BASMI, the Schöber test, and cervical rotation. These results suggest that the UCOTrack movement analysis is an advance in the functional assessment of axial spondyloarthri-tides and open the door to use this technology in the monitoring of these patients and in future experimental or observational studies.
Acknowledgment
The authors acknowledge the patience and collaboration of the patients, who became true partners in research.
Footnotes
Study funded by Merck Sharp & Dohme of Spain. Dr. L. Cea-Calvo and Dr. M.J. Arteaga are full-time employees at Merck Sharp & Dohme of Spain. UCOTrack is owned by the University of Córdoba and the Andalusian Health Service.
- Accepted for publication February 28, 2018.