Abstract
Objective. The Bath Ankylosing Spondylitis Functional Index (BASFI) is the most popular method to assess activity capacity in axial spondyloarthritis (axSpA), to our knowledge. It is endorsed by the Assessment of Spondyloarthritis international Society. But it may have recall bias or aberrant self-judgments in individual patients. Therefore, we aimed to (1) develop the instrumented BASFI (iBASFI) by adding a body-worn accelerometer with automated algorithms to performance-based measurements (PBM), (2) study the iBASFI’s core psychometric properties, and (3) reduce the number of iBASFI items.
Methods. Twenty-eight patients with axSpA wore a 2-axial accelerometer while completing 12 PBM derived from the BASFI. A chronometer and both manual and “automated algorithm-based” acceleration segmentation identified movement time. Test-retest trials and methods (algorithm vs manual segmentation/chronometer/BASFI) were compared with ICC, standard error of measurement [percentage of movement time (SEM%)], and Spearman ρ correlation coefficients. Linear regression identified the optimal set of reliable iBASFI PBM.
Results. Good to excellent test-retest reliability was found for 8/12 iBASFI items (ICC range 0.812–0.997, SEM range 0.4–30.4%), typically with repeated and fast movements. Automated algorithms excellently mimicked manual segmentation (ICC range 0.900–0.998) and the chronometer (ICC range 0.878–0.998) for 10/12 iBASFI items. Construct validity compared with the BASFI was confirmed for 7/12 iBASFI items (δ range 0.504–0.755). Together, sit-to-stand speed test (stBeta 0.483), cervical rotation (stBeta −0.392), and height (stBeta −0.375) explained 59% of the variance in the BASFI (p < 0.01).
Conclusion. The proof-of-concept iBASFI showed promising reliability and validity in measuring activity capacity. The number of the iBASFI’s PBM may be minimized, but further validation in larger axSpA cohorts is needed before its clinical use.
- ANKYLOSING SPONDYLITIS
- PHYSICAL ACTIVITY
- MOBILITY LIMITATIONS
- QUESTIONNAIRE
- VALIDITY AND RELIABILITY
- TECHNOLOGY ASSESSMENT
Typical features of axial spondyloarthritis (axSpA) include inflammation of the spinal and sacroiliac joints, often accompanied with syndesmophytes, bony entheseal spurs, and joint ankylosis1. Axial inflammation and bone formation manifest clinically as inflammatory back pain, loss of spinal mobility, and spinal stiffness2. Peripheral arthritis and enthesitis, as well as extraskeletal features such as psoriasis, uveitis, and gut inflammation may add to the systemic characteristic of the disease3.
In 1999, the Assessment of SpondyloArthritis international Society (ASAS) expert group developed a “minimal core set” of 9 health-related domains to monitor all aspects of disease outcome in patients with axSpA in trial and clinical record-keeping settings4. Strong consensus was reached for the inclusion of the physical function domain across settings and the contemporary Bath Ankylosing Spondylitis Functional Index (BASFI) was recommended5,6. Over the following decade, the importance of physical function was further evidenced by its pivotal involvement in the ASAS20, ASAS40, and ASAS5/6 improvement criteria7,8.
Lacking any operational definition, the content of the physical function domain in axSpA remained surprisingly ill-defined until ASAS attempted in 2010 to also integrate the effect of disease into the outcome assessment by tailoring the World Health Organization’s (WHO) International Classification of Functioning, Disability and Health (ICF) to develop “core sets” for the evaluation of axSpA9. Physical function is now largely reflected in the ICF components “activities” and “participation,” where “activities” are defined as “the execution of tasks or actions by an individual” and “participation” as “involvement in life situations”10,11.
Applying the ICF clinically, ASAS/WHO has proposed a patient-reported “capacity and performance qualifier” (ranging from 0 = no difficulty to 4 = complete difficulty to execute a task) to evaluate “activity limitations and participation restrictions”12. Similarly, the recently developed ASAS Health Index adopted patient-reported dichotomous (I agree/I do not agree) response options to evaluate the broader construct of “functioning”13. Unfortunately, this first implementation of the WHO/ASAS/ICF axSpA core sets9,12 has failed to recognize the crucial distinction between what a person can do in a standardized environment (activities: capacity qualifier) versus in a real-life situation (participation: performance qualifier)10. This is problematic for valid outcome assessment in clinical care and research. Indeed, to improve an individual’s intrinsic activity capacity, a rehabilitation approach focusing more on body functions and structures is needed, while contextual rehabilitation goals targeting environmental (e.g., ergonomic adaptations) and personal (e.g., motivation) factors may add to performance in the real-life environment. Similar to the popular BASFI, the WHO/ASAS/ICF axSpA core sets still rely on the patient’s self-report and target the difficulty experienced with the execution of activities in daily life. A large body of evidence suggests that the cognitive process of mapping this experience into the construct of activity capacity or performance is patient-specific and can be distorted by psychological factors such as pain14, differences in reference frame15, cognitive impairments, motivation, anxiety, and depression16,17.
Performance-based measures (PBM) deliver a more direct and standardized observation of activity capacity and are less influenced by psychological or environmental factors than patient-reported outcomes14,18. Typically, a trained observer translates the visual inspection of the activity of interest into an activity capacity metric such as time (using a chronometer), number of repetitions, or distance19. The 1-week test-retest reliability of a series of common PBM reflecting each BASFI item was established in patients with axSpA, minimizing the involvement of fluctuating activity capacity20. Interestingly, PBM seem to also identify small improvements in activity capacity in patients with axSpA not responding to antitumor necrosis factor (anti-TNF) therapy21. Similar in osteoarthritis (OA), PBM show different recovery pathways after joint replacement and do not have ceiling effects in longterm followup of activity capacity22. However, especially for large clinical trials, feasibility is of major concern because trained observers, facilities, and presence of the patient are needed19.
Automated identification of movement data using sensors may have the potential to speed up data collection and downsize observer training of PBM in routine practice. Semiautomated accelerometer algorithms with proven psychometric properties exist in patients with parkinsonism and stroke, but are limited to selected tests such as walking, standing, and sit-to-stand (STS) activities23,24,25. In patients with axSpA, the use of body-worn sensors is still in its infancy and focuses on accelerometer algorithms for physical activity assessment26. Therefore, our proof-of-concept study aimed to (1) present the development of the instrumented BASFI (iBASFI), adding fully automated algorithms during PBM to obtain activity capacity outcomes in axSpA, (2) evaluate the test-retest reliability of the automated algorithms and their criterion validity in comparison with manual feature selection in axSpA, (3) compare these novel activity capacity outcomes for construct validity to chronometer-based PBM and the BASFI; and (4) create a preliminary optimal set of PBM that reflected the BASFI best.
MATERIALS AND METHODS
Participants
Our psychometric study randomly included 28 subjects with a diagnosis of axSpA according to the ASAS classification criteria2, verified by an ASAS expert rheumatologist (KDV), from the outpatient axSpA clinic at the University Hospitals of Leuven, Belgium. Exclusion criteria were (1) a history of spinal fractures or other fractures within 12 months, lower quadrant musculoskeletal injuries not related to SpA, discitis, and spondylolisthesis, (2) current symptoms of severe health conditions (e.g., heart failure), and (3) not being able to stand or walk without an aid. All subjects provided written informed consent prior to participation according to the Declaration of Helsinki. The study protocol fulfilling the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) criteria27 (completed COSMIN form available from the authors on request) was approved by the Medical Ethics Committee of the University Hospitals Leuven (ML 5236).
Anthropometrics and demographics
Anthropometric measures were taken by the same observer prior to other assessments. Height was measured with a stadiometer (Holtain Ltd.) to the nearest 0.1 cm and weight was measured with a digital scale (SECA) to the nearest 0.1 kg. Work status was assessed with the work productivity survey28.
Patient-reported activity capacity
Patient-reported activity capacity was assessed with the ASAS-endorsed and widely accepted BASFI questionnaire29. This instrument asks the respondent to rate the perceived difficulty in executing commonly limited activities in axSpA on an 11-point numerical rating scale with a range of 0 (easy) to 10 (impossible)30. A total score is calculated by dividing the sum of all items by 10, producing a score between 0 and 10. The psychometric properties of the BASFI in axSpA are well established31,32,33,34.
Performance-based activity capacity
Performance-based activity capacity was evaluated by performance-based tests (Figure 1) mimicking the BASFI20 and the WHO/ICF/ASAS core sets9 activities to ensure content validity. Each performance was timed with a chronometer (Geonaute) in seconds (0.01 s). The observer received a written manual and was trained to standardize the onset and end of movement. Concurrently, a body-worn, 2-axial accelerometer (Sensewear Pro 3 Armband, Bodymedia Inc.) was mounted on the triceps region (middle between humeral head and olecranon) of the dominant arm to automatically identify acceleration vectors (m/s2) at 32 Hz across the longitudinal and transversal axis. For the maximal reach test, movement duration was complemented with distance using a stadiometer (Janssen-Fritzen), while for the looking-over-the-shoulder test, only range of motion collected with a mounted goniometer (ORTEC) was considered relevant. All PBM were repeated twice with 30 s in between (test-retest reliability) and most tests consisted of both a “self-selected pace” and “maximal speed” trial. The PBM were randomized and counterbalanced for side. Supplementary Data 1 (available online at jrheum.org) has a detailed description of each PBM test reflecting the candidate iBASFI items (iBASFI development set; Supplementary Data 2, available online at jrheum.org).
Data reduction and statistical analysis
Descriptive data were presented as mean, median, and 25th and 75th percentile. Normal distribution of all variables was evaluated with the Shapiro-Wilk test (p < 0.05). Movement time was extracted from low-pass filtered accelerometer signals using custom-written automated algorithms (accelerometerauto) in MatLab (Mathworks) and by calculating the mean of manual signal segmentations (accelerometermanual) by 2 blinded evaluators (bachelor level physical therapists not involved in any part of our study). The test-independent automated algorithm entailed a set of heuristic rules obtained from a pilot set of acceleration data (n = 3), applied wavelet-based filtering and took into account SD of signal variables in a sliding window for both the transverse and longitudinal axis to detect the start and end of each movement.
Trials (within-session attempt 1 vs 2) and methods (accelerometerman vs accelerometerauto, chronometer vs accelerometerauto, accelerometerauto vs corresponding BASFI-item, accelerometerauto vs corresponding BASFI total) were compared with the ICC [test-retest trials: 2-way mixed; intermethod (see comparisons above): 2-way random model], the 95% CI for the standard error of measurement [SEM; also clinically expressed in percent of total movement time (SEM%)], and the Spearman ρ correlation coefficient. Criteria to evaluate ICC were < 0.70 (poor), 0.70–0.79 (adequate), 0.80–0.89 (good), and > 0.90 (excellent)20. We hypothesized a significant correlation of 0.50 or more to confirm convergent construct validity between PBM and the BASFI questionnaire and its items35. A stepwise linear regression was used to model the BASFI from an optimal set of reliable (ICC > 0.80) performance-based tests obtaining a pilot version of the short iBASFI. Sex, height, weight, and body mass index were also included into the regression’s independent variables to exclude anthropometric effects. The criteria to include or exclude variables were a probability of F ≤ 0.05 (in) and ≥ 0.10 (out), respectively. To avoid colinearity, the variance inflation factor was set at < 3, and the tolerance level at > 0.10.
RESULTS
Demographics and disease-related characteristics are presented in Table 1 and indicate a typical outpatient sample of patients with axSpA in terms of age, disease-related variables, and medication use36. However, the severity of thoracic kyphosis (tragus-to-wall distance) and hip involvement (intermalleolar distance) was rather limited with median values of 1 and 2 out of 10, respectively. Of the 29 patients invited, only 1 male subject refused to participate in our study because of a lack of time (0.03%).
Test-retest reliability of the instrumented performance-based (iBASFI) tests
Full data are shown in Table 2. Good to excellent test-retest reliability for instrumented PBM was found with the lowest ICC values for sock, reach, shoulder speed, and STS tests (ICC range 0.528–0.785), but very high ICC values for pen, pen speed, reach height, STS speed, lying down, getting up, stair climbing, and cervical rotation tests (ICC range 0.812–0.997). Overall, single and/or self-selected pace movements were less reproducible than repeated and/or fast-paced movements (e.g., pen ICC 0.812 vs pen speed ICC 0.974). SEM% values ranged from 0.4% to 23.9%, excluding the pen (30.4%) and reach tests (32.0%).
Automated algorithm versus manual segmentation and versus a chronometer
Automated segmentation of acceleration signals to obtain movement duration excellently mimicked the mean value of manual segmentation (ICC range 0.900–0.998; Table 3), except for the reach (ICC 0.727) and STS tests (ICC 0.599). SEM% values ranged from 3.8% to 30.4%. Thus, criterion validity of the automated algorithm in comparison to manual segmentation was confirmed for almost all tests.
In comparison with the chronometer, the automated algorithm validly assessed movement duration (ICC range 0.878–0.998) apart from the maximal reach (ICC 0.532) and STS tests (ICC 0.770). SEM% values ranged from 4.5% to 23.0%, ignoring the reach test (31.9%). Together, convergent construct validity of the automated algorithm was established for almost all tests.
Construct validity and item reduction of the iBASFI
Good to excellent convergent construct validity for the instrumented pen, pen speed, STS speed, lying down, getting up, stair climbing, and cervical rotation tests was found evidenced by significant and good associations with the BASFI scale (Spearman ρ range 0.504–0.638; Table 4). All these tests, except for lying down, showed similar correlations with their corresponding BASFI item. Surprisingly, none of the timed arm tests reached significance, and none showed relevant correlations with the BASFI or its item on reaching toward a shelf, while the maximal distance reached almost showed a significant and moderate correlation with the BASFI, but not its corresponding item.
Based on the test-retest reliability and construct validity in comparison with chronometer results for the automated algorithm, the sock, pen speed, shoulder speed, STS speed, lying down, getting up, stair, and cervical rotation tests were entered in the model in addition to anthropometric variables. The core set of the iBASFI items consisted of the STS speed test, the cervical rotation test, and height (Figure 2). This core set explained 59% of the variance in the BASFI, confirming convergent construct validity of the composite and successful reduction of items. No issues on colinearity were detected.
DISCUSSION
To our knowledge, our proof-of-concept study was the first to describe the development of an iBASFI and to confirm its psychometric properties. This novel methodology included the careful translation of individual BASFI items into instrumented performance-based tests that reflect key activity limitations in patients with axSpA29,37.
The concept of “activity limitations” is a relatively new term in the field of SpA and largely replaces the ill-defined “physical function” concept, a key domain in the 1999 ASAS core set to monitor patients with ankylosing spondylitis (AS)9,11,38. “Activity limitations” or positively formulated “activity capacity” refers to the difficulty or ability to execute a task in a standardized environment without assistance10.
Over the past decades, the assessment of activity limitations in patients with axSpA evolved from self-reported questionnaires29,39,40,41 toward direct observation/judgment37 and performance-based tests by an operator. The latter PBM typically quantify movement duration by a hand-held chronometer or repetition counts within a standardized time frame20,42 reflecting the idea of direct and objective assessment of activity limitation. Our study investigated and innovated these PBM by adding a combination of data from a body-worn, 2-axial accelerometer and fully automated algorithms to detect movement duration. This methodology may have clear advantages over the existing methods.
First, self-reports of activity limitation rely on processes of comprehension, frame of reference (standardized task environment such as chair height), memory retrieval, motivation, emotional status43, and response mapping. Although highly feasible and popular in patients with axSpA, they are considered valid at the group level only or to assess change through responder criteria7. Similarly, observational methods are also still subject to interpretation bias44. Performance-based tests may be less prone to personal and environmental influences18, although direct evidence in axSpA is currently not available. Our correlation data with the BASFI largely confirm shared information between techniques (i.e., concurrent construct validity), except for the sock, reach, reach height, and shoulder speed tests. For the sock test, this mismatch may be explained by variable strategies a patient may use (e.g., put foot on a chair) during daily living, while during the performance-based test, the mode of execution was restricted to sitting for standardization. The latter method ensured close operationalization of the ICF definition of activity capacity (ability in a standardized environment) that contrasts with activity performance (ability in the subject’s real-life environment). For all reaching tasks, the relationship with the BASFI total or corresponding BASFI item is apparently not dependent on arm function, but more on actual body height as seen in the regression analysis. Arm activities may be important for some patients with local shoulder problems, but do not cover the content of the BASFI well.
Second, questionnaires such as the BASFI tend to have floor or ceiling effects and/or just do not pick up longterm changes in activity limitation. For example, several longterm followup studies on anti-TNF now exist45 and uniformly show a mean BASFI of about 7/10 at baseline which, after initial therapy response, remains stable at about 3/10 during the 8-year followup. In contrast, performance-based tests in AS may be more sensitive to change because they are able to detect improvement even in nonresponders21 during anti-TNF therapy at 3 months of followup, although learning effects could not be excluded. Similar, in patients scheduled for hip and knee arthroplasty, Stratford, et al46 showed the superiority of performance-based measures to detect increased activity limitation prior to and recovery after surgery in comparison to self-reports. Arguing in favor of our accelerometry-based approach, direct measurement of activities may reveal discriminatory features not included in questionnaires or PBM using chronometers. This was illustrated by the added value of accelerometry in the assessment of turning in patients with Parkinson disease with different stages of disease47.
Third, our study found preliminary but overall excellent psychometric properties for all fast-paced and repeated or complex tests. In a study by van Weely, et al20, 1-week test-retest reliability of chronometer- and performance-based tests ranged between 0.73 and 0.96, indicating a slightly lower range of reliability compared with our results. The accelerometer used in our study may have limited the variance induced by a physical therapist operating the chronometer. Also, our fully automated feature selection with proven validity in comparison to manual segmentation of signals contrasts with previously reported semiautomated algorithms24 and favors feasibility. In addition, pilot regression analysis revealed the possibility to obtain a core set of iBASFI items more suitable for clinical practice. The repeated STS movements added most information on activity capacity to the iBASFI. Remarkably, this gross motor task has turned out to be a key-limited, performance-based measure in other disease populations.
There are some limitations to our study. Our proof-of-concept study has a limited sample size that could have affected power. However, in the direct comparison of instrumented performance-based tests with each BASFI item and the total scale to confirm concurrent construct validity, no correlation coefficients of sufficient magnitude to reach the validity criterion showed insignificance. Also, only 4 out of 24 correlation coefficients of insufficient magnitude turned out to be insignificant (r BASFI scale: reach height, STS; r BASFI item: sock, lying down). Sample size did not affect regression analysis quality; however, our item reduction is preliminary because cross-validation, responsiveness, and feasibility were not included in the selection procedure19. Although we randomly selected a typical sample of outpatients with axSpA, sample size may have affected the generalizability of results. Ongoing research in a larger sample of patients with axSpA will tackle these issues and is needed to prepare the iBASFI for use in clinical practice.
Another limitation may be the reliance on the BASFI to develop the PBM. Although the content validity of this scale is excellent11, one is not able to tailor the automated algorithms to patient-specific activities at this point. The preliminary core set of 3 PBM is, however, a good starting point for future research.
Finally, that we did not evaluate all aspects of feasibility of the iBASFI may be a limitation. Performing the extensive protocol took < 20 min on average and we estimate the short iBASFI (height, STS speed test, and cervical rotation) to take < 5 min. This included positioning and loading of the sensor, done by a physical therapist who instructed the patient. Future studies may investigate video instruction and automated detection of sensor location to make the assessment procedure fully automatic48. Also, a distinction between a clinical iBASFI and research iBASFI may be considered in future research. An animated activity questionnaire was developed that asked patients with OA to map their activity limitation to videos showing different abilities49,50. Because the animated questionnaire correlated highly with PBM (but unexpectedly also self-reports49) in these validity experiments, this technique may overcome frame-of-reference issues in self-reports and may reduce the time and resources inherent to performance-based testing. Future research should compare these techniques head-to-head and elucidate their unique involvement in axSpA outcome assessment.
Our proof-of-concept iBASFI showed promising test-retest reliability, construct validity in comparison with chronometer- and performance-based testing, and construct validity in comparison with the BASFI questionnaire. Future studies should elucidate the added value of these technology-based measures and further validate the iBASFI for use in clinical practice and research.
ONLINE SUPPLEMENT
Supplementary data for this article are available online at jrheum.org.
Acknowledgment
The authors thank Frederik Adams (physical therapist), all participants, and all staff members who operationally contributed to this study.
- Accepted for publication April 19, 2016.