Abstract
Objective Physical function in patients with axial spondyloarthritis (axSpA) is currently evaluated through questionnaires. The Ankylosing Spondylitis Performance Index (ASPI) is a performance-based measure for physical functioning, which has been validated in Dutch patients with radiographic (r−) axSpA. The interrater reliability has not yet been determined. To our knowledge, this study is the first to evaluate the validity, reliability, and feasibility of the ASPI in another patient population, including both r− and nonradiographic (nr−) axSpA patients.
Methods Patients with axSpA were recruited from rheumatology clinics in Santiago, Chile. Dutch instructions were translated to Spanish by a forward-backward procedure. Study visits were performed at baseline and 1–4 weeks later. Four ASPI observers were involved, measuring the performance times of the 3 ASPI tests. Validity was assessed through a patient questionnaire (numeric rating scale 0–10: ≥ 6 sufficient). For reliability, intraclass correlation coefficients (ICC) were calculated (with 95% CI). Correlations between the ASPI and disease variables were tested with regression analyses.
Results Sixty-eight patients were included (57% male, 52% r-axSpA). All patients understood the Spanish instructions and considered the ASPI to reach its aim (84%) and representativeness (85%) for physical functioning. The overall interrater (n = 62) and test-retest (n = 39) reliability (ICC) of the 3 tests combined were 0.93 (0.88–0.96) and 0.94 (0.87–0.97), respectively. Eighty-two percent of the patients completed all tests and 94% finished in < 15 min (feasibility).
Conclusion This study demonstrated a high validity and feasibility in an entirely different population, with both r-axSpA and nr-axSpA. The interrater and test-retest reliability was excellent. The ASPI instructions are now available for Spanish-speaking patients.
In axial spondyloarthritis (axSpA), inflammation and structural damage cause increasing limitations in spinal mobility1. Although the course is highly variable, tumor necrosis factor-α inhibitors (TNFi) can delay the deterioration and improve functioning2,3,4,5. As limitations in physical functioning influence the quality of life, physical ability is a core component [Assessment of SpondyloArthritis international Society (ASAS)/OMERACT core set] of disease outcome6,7,8,9.
Unfortunately in axSpA, measurement of physical function still mostly relies on the patient-reported Bath Ankylosing Spondylitis Functional Index (BASFI) questionnaire10, which measures the perceived level of physical functioning, arising from a complex interplay between psychological (e.g., needs, priorities, pain) and environmental (e.g., culture, participation) factors, aside from the physical mobility itself11,12. Consequently, in 2009, van Weely and colleagues constructed the first performance-based test, based on the BASFI questionnaire: the Ankylosing Spondylitis Performance Index (ASPI). The ASPI measures the time to perform 3 daily activities (bending to pick up 6 pencils from the floor, putting on socks, and standing up from the floor)13,14. The ASPI has previously shown an adequate to excellent [intraclass correlation coefficient (ICC) > 0.70] intrarater test-retest reliability and good responsiveness (after TNFi initiation), and it successfully measures different aspects of function compared to the BASFI questionnaire12,15.
However, some test properties still have to be determined. First, the interrater reliability has not been evaluated. Second, existing studies have focused only on patients with radiographic axSpA (r-axSpA), while nonradiographic axSpA (nr-axSpA) forms an increasing part of the axSpA population. Further, the ASPI has been validated only in 1 Dutch population. The clinimetric properties and feasibility of the ASPI in other patient populations with higher discrepancies in treatment access are not yet clear.
The primary objectives of the current study were to validate the ASPI in another patient population, including r- and nr-axSpA patients, and to determine the interrater reliability for the first time. The secondary objective was to make the ASPI accessible for Spanish-speaking patients, because Spanish is the fourth most spoken language worldwide, used in over 30 countries. The study evaluated the following clinimetric properties: (1) content validity, (2) construct validity (translation and hypotheses testing), (3) reliability (interrater reliability and intrarater test-retest reliability), and (4) feasibility in clinical practice.
MATERIALS AND METHODS
Study population and design
Patients were consecutively recruited in February and March 2019, from the outpatient rheumatology clinic of the Padre Hurtado Hospital and Alemana Clinic (Desarrollo University, Santiago, Chile), and through the Chilean SpA patient foundation (Espondilitis Chile). The study was performed within a preexisting collaboration between the rheumatology departments of the study site and the Amsterdam University Medical Centre (the Netherlands). Eligible patients were ≥ 18 years, had a diagnosis of axSpA confirmed by a rheumatologist, and were classified as r-axSpA or nr-axSpA according to the ASAS criteria for axSpA16. Exclusion criteria were insufficient capability to understand the instructions or significant comorbidity (e.g., cardiovascular or neurological), potentially interfering with the safety of ASPI performance. The study consisted of a baseline assessment of all patients and a subsequent (retest) visit in all patients available within 1 to 4 weeks after baseline (Figure 1). The protocol was approved by the medical ethics committee of the Alemana Clinic (approval number 2018-103). All patients gave written informed consent before inclusion. The study was performed according to the Declaration of Helsinki.
The ASPI
The ASPI is performed in a common consultation room and consists of 3 standardized performance tasks: (1) bending to pick up 6 pencils from the floor (one by one); (2) putting on socks (mean of 3 times); and (3) getting up from the floor (mean of 3 times). During the test, patients are allowed to use a chair or bench/table (both standard test setup) to sit or lean on. The time to complete a task is measured for each test in seconds (and hundredths). In addition, for each test patient-reported scores for test-related pain and exertion are collected, based on, respectively, a numerical rating scale (NRS) and the Borg scale (anchors 0: no pain/no exertion, and 10: extreme pain/exertion)17. A more detailed description of the ASPI was published previously13 (Supplementary Data 1, available from the authors on request).
Content validity
Content validity was defined as “the degree to which the content of the outcome measure is an adequate reflection of the construct to be measured,” and includes the relevance, comprehensiveness, and comprehensibility of the measure18. Therefore, immediately after the ASPI, patients were asked to evaluate the aim, representativeness and utility of the ASPI (Figure 1). The following questions were asked: “Did the ASPI give insight into your physical capabilities?” (aim/relevance), “Did the ASPI-test represent your physical functioning in daily life?” (representativeness/comprehensiveness), and “Do you think it is of additional value to perform the test at least yearly in axSpA patients?” (utility). Answers were given on an NRS 0–10: anchors 0 poor to 10 excellent, ≥ 6 sufficient, and ≤ 5 insufficient.
Construct validity: translation and evaluation
The original Dutch ASPI patient instructions were translated into Spanish through a forward-backward procedure19. Four nonmedical bilingual speakers were involved: 2 with (Peruvian and Colombian) Spanish, and 2 with Dutch as their mother tongue. The native Spanish speakers (JP, LOBN) translated the instructions independently into Spanish (forward phase). A consensus translation (Supplementary Data 2, available from the authors on request) was made by a Chilean rheumatologist, and reviewed by the translators. Subsequently, the native Dutch speakers (MH, FdG) translated the consensus translation independently into Dutch (backward). Both translations were reviewed and approved by the ASPI team of the Amsterdam University Medical Centre and considered to not diverge significantly from the original instructions. All patients evaluated the clarity and comprehensibility of every test instruction (“Do you now know what is expected from you”: yes or no) and for the instructions as a whole: “How comprehensible were the instructions?” and “How clear were the instructions?” (both rated on an NRS 0–10, anchors 0 poor to 10 excellent).
Hypotheses testing
Correlations between existing axSpA measures and the ASPI were tested, defined by R2 (the percentage of variance in a disease variable explained by the ASPI). Based on the Dutch validation study12, we formulated the following hypothesis:
Correlation of the ASPI with the BASFI: R2 0.3–0.5
Correlation of the ASPI with the Bath Ankylosing Spondylitis Disease Activity (BASDAI), Ankylosing Spondylitis Disease Activity Score based on C-reactive protein (ASDAS-CRP), and Bath Ankylosing Spondylitis Metrology Index (BASMI): R2 < 0.3.
Two correlation models were tested: one between the ASPI and the different disease variables; and one between the ASPI with ASPI-related pain and exertion and the different disease variables. It was expected that the combination of the ASPI and test-related pain and exertion (multivariable analyses) would correlate better with the BASFI, ASDAS, or BASDAI, than the ASPI alone, because similar to pain and exertion, these variables are patient-perceived. However, this would not apply to the BASMI, because similar to the ASPI, the BASMI is not patient-perceived12. In addition, we hypothesized that the more ASPI tests a subject could complete, the lower the disease activity variables would be.
Interrater reliability and intrarater test-retest reliability
Four ASPI observers (A, B, C, D; respectively RvB, SIV, MPF, FVA) were involved and trained according to the ASPI manual (Supplementary Data 1, available from the authors on request). At baseline, 2 different observers executed the ASPI, with at least 15 min in between and blinded for each other’s results. Observer RvB tested the ASPI in all patients, while the other observers alternated (Figure 1).
At the retest visit (1–4 weeks after baseline), the ASPI was performed only once, with the same observer who executed the baseline test for that specific patient, for test-retest analysis.
Feasibility
The feasibility in clinical practice was based on the following criteria: (1) the average time to perform the entire ASPI test (from the first instructions until test 3 was finished), with the goal that 90% of the patients should finish in < 15 min; (2) the number of patients who can perform all tests, with a goal of ≥ 80% of the patients; and (3) an acceptable test-related burden (“Did you experience performing the ASPI test as a burden?”; NRS 0–10, anchor 10: no burden). The goal was set at ≥ 80% of the patients answering NRS > 5.
Other study variables
Besides the ASDAS-CRP, BASMI, BASFI, and BASDAI, several baseline variables were collected: demographics (age, sex), disease characteristics (disease duration, peripheral and extraarticular manifestations, HLA-B27), current treatment (nonsteroidal antiinflammatory drugs, disease-modifying antirheumatic drugs, TNFi), enthesitis (Maastricht Ankylosing Spondylitis Enthesitis Score), joint pain (yes/no) and laboratory results (CRP, erythrocyte sedimentation rate, ≤ 28 days before or after study visit). Questionnaires were filled in before ASPI performance and the ASPI was always performed before the BASMI. At the retest visit, the BASDAI, BASFI, and BASMI were repeated, in the same order as at baseline.
Statistical analyses
Data are presented as mean (± SD), number (with percentage), or percentage. Non-normally distributed variables were presented as median (IQR) and analyzed with the Mann-Whitney U test. Differences in disease activity scores between baseline and T1 were tested with the chi-square test or paired t test.
The ASPI and test-related pain and exertion results of ASPI test 2 and 3 were based on the mean of 3 repetitions. For patients unable to complete all repetitions, the mean score was calculated based on the completed tests (e.g., 1 or 2 repetitions). For further analyses, the ASPI, test-pain, and test-exertion were converted into 3 single z-scores (z-ASPI, z-exertion, z-pain), combining the data of the 3 tests in 1 overall score [e.g., z-ASPI of ASPI performance times: (z-ASPI-test1 + z-ASPI-test2 + z-ASPI-test3) / 3 = z-ASPI].
Interrater reliability and intrarater test-retest reliability were determined for the 3 tests independently and for the overall ASPI score (z-ASPI). For the ASPI and test-related exertion, the ICC (respectively, 1-way randomized model and 2-way mixed effect model, single measures) were reported with 95% CI. For the test-related pain, the linear weighted κ was used, with 1000 bootstrap replicates to estimate the 95% CI. Additionally, for the ASPI test-retest reliability, the standard error of measurement and minimal detectable change were calculated. Bland-Altman plots, visualizing the mean difference between 2 test moments (either 2 observers or baseline and retest), were retrieved to assess systematic differences and to calculate the limits of agreement (mean difference ± 1.96 × SDmean difference). The level in which the mean difference between observer 1 and 2 would differ from zero was tested, for every test, with the 1 sample t test.
For hypothesis testing, the baseline ASPI data collected by the first observer (T0-observer-1) were used. The correlations between the ASPI or “number of completed ASPI tests” and disease variables (BASFI, BASDAI, ASDAS, BASMI) were determined in univariable (disease variables as dependents) and multivariable linear regression models (addition of z-exertion and z-pain as independents).
All statistical analyses were performed with IBM SPSS statistics for Windows (released 2013; IBM SPSS Statistics for Windows, Version 22.0, IBM Corp.).
RESULTS
Study population
In February and March 2019, 72 patients were recruited. Four were excluded because they did not fulfill the ASAS criteria for axSpA. Of the 68 included patients, 52% were diagnosed with r-axSpA, 19% with nr-axSpA, and in 28% it was not possible to differentiate between r- and nr-axSpA because only magnetic resonance imaging was performed for axSpA diagnosis. Patient characteristics and ASPI test results are depicted in Table 1 and Table 2.
Validity of the content and translation
All patients completed the questionnaire on the validity of the content and translation, using an NRS (10: excellent; NRS ≥ 5 sufficient). According to 84% and 85% of the patients, respectively, the ASPI achieved its aim (insight into physical capabilities; median NRS 8, IQR 7–10) and was representative for their daily physical functioning (median NRS 8, IQR 7–9). In addition, 94% of the patients rated the ASPI to be of additional value in daily care (median NRS 10, IQR 10–10). All patients scored the ASPI instructions as sufficiently clear and comprehensible (both median NRS 10, IQR 10–10).
Hypotheses testing
The level in which the ASPI correlated with the BASFI, BASDAI, and ASDAS was higher than previously hypothesized (Table 3). As expected, the R2 improved further after adding the ASPI-related pain and exertion (subjective variables) to the model (Table 3). As hypothesized, the correlation between the ASPI and BASMI was moderate and did not change after incorporation of the patient-reported pain and exertion.
In addition, the number of ASPI tests a patient was able to complete was significantly (all p < 0.01) and inversely associated with the BASDAI (β −0.6; 95% CI −1.0, −0.2), ASDAS (β −0.4; 95% CI −0.6, −0.2), BASFI (β −1.0; 95% CI −1.4, −0.5), and BASMI (β −0.5; 95% CI −0.8, −0.2): the more tests a patient completed, the lower the scores.
Interrater reliability
The interrater reliability was tested in 62 patients, because 6 patients were unable to repeat the ASPI owing to pain or exertion. The overall interrater reliability of the ASPI was 0.93 (95% CI 0.88–0.96) and for the individual tests, good to excellent (Table 4). Although on group level, Bland-Altman plots showed significant differences between T0-observer-1 and T0-observer-2 for tests 2 and 3 (slightly lower ASPI time at T0-observer-2; P < 0.01 and P = 0.04, respectively; Figure 2), individual comparison of the different observer pairs did not show systematic differences between the observers. The reliability for ASPI-related pain and exertion was moderate to excellent (Table 4).
Intrarater test-retest reliability
Over half of the patients (n = 39, 57%) were also available for a retest study visit within 1 to 4 weeks (33 patients were unavailable because of travel distance or work). Retest visits were performed after median 7 days (IQR 7–14; min–max 5–30 days; 21 patients ≤ 7 days, 32 ≤ 14 days). The mean BASDAI was lower at T1 compared to T0 (P = 0.01; BASDAI −0.7 on group level, and −0.58 for the patients who participated in both visits). There were no changes in medication between T0 and T1, nor significant baseline differences (patient characteristics, disease activity, ASPI scores) between patients who did and did not participate in the retest visit (data not shown).
The test-retest reliability was higher for the ASPI (overall ICC 0.94, 95% CI 0.87–0.97; Table 4) than patient-experienced pain or exertion during the ASPI (Table 4). Bland- Altman plots (Supplementary Data 3, available from the authors on request) showed significant differences between T1 and T0 for test 1 and 2 (ASPI time 2 and 3 s shorter at T1 for tests 1 and 2, P = 0.01 and P < 0.01, respectively). Individual evaluation of the different observers did not show structural differences between T1 and T0.
Feasibility
At baseline, the use of a chair or bench/table was needed for test 1, 2 or 3 in 5%, 40% and 15% of the patients, respectively. In Table 2, the number of patients performing at least 1 test repetition was described, for every test. In total, 56 patients (82%) were capable of performing the entire ASPI test, including all required repetitions for test 2 and 3. Some patients were not able to fully perform picking up 6 pencils (n = 2), putting on socks (n = 4), and/or standing up off the ground (n = 11), because of too much pain (n = 8), physical incapacity (n = 3), or safety (n = 1). These patients had a significantly worse disease state (worse BASDAI, ASDAS, BASMI, BASFI, ASPI; all p < 0.001; data not shown).
The total time needed to complete the ASPI was in 94% of the patients < 15 min (mean 9, SD 3 min) and even shorter during subsequent moments (Table 2). The majority of the patients did not experience the test as a significant burden (anchor 10: no burden at all; 85% NRS > 5; median 9, IQR 8–10).
DISCUSSION
To our knowledge, this study is the first to report on the performance of the ASPI test in a non-Dutch patient population, including both r− and nr-axSpA patients. The interrater reliability of the ASPI, as well as the intrarater test-retest reliability, was good to excellent in this population. In addition, the test met all predefined feasibility criteria, regardless of the differences in diagnosis (r− and nr-axSpA). Patients reported the test to be representative for their physical limitations (content validity) and of additional value in daily clinic, while the test-related burden was relatively low.
The ASPI was developed and validated in 2009, by van Weely, et al13. Although their patient cohort included only patients with r-axSpA, had fewer women (28% vs 43%), and as expected, had a higher HLA-B27 prevalence (in Chile 2–3%20), the BASDAI, BASMI, and patient-reported functioning (BASFI) did not differ importantly from this Chilean cohort. The current population had a relatively shorter disease duration (−3.5 yrs) and higher biologic use (40% vs 21%) compared with the van Weely, et al study, which included patients > 10 years ago, when biologic treatment was less common in the Netherlands. This might have led to somewhat shorter performance times for test 2 and 3, in the current study, compared to van Weely, et al. Importantly, the interrater reliability, which had not been determined before, was excellent. In addition, the current study found an even better ASPI intrarater test-retest reliability than reported previously (ICC test 1: 0.73, 2: 0.94, and 3: 0.86, vs current study: 1: 0.89, 2: 0.88, and 3: 0.96). As was demonstrated by van Weely, et al, the ASPI showed a higher reliability compared to the patient-reported ASPI-related pain and exertion13. A new aspect of our study, compared with the earlier ASPI studies, was the assessment of the total number of tests a patient could perform, and the use of a bench/table or chair, both of which showed an important association with disease activity. Potentially these items can be incorporated into the ASPI.
Because the actual functioning arises from a complex interplay of the physical ability itself, and psychological and environmental factors, the patient-perceived functioning (BASFI) is an important outcome9. However, it has its limitations when it comes to monitoring physical functioning21. Previous studies have reported discordance between self-reported and observed functioning, and psychological factors explain a significant part of the variance in BASFI11,22. As expected, in our study, the association between ASPI and patient-perceived functioning also improved when the ASPI was combined with the patient test perception (test-pain and test-exertion)12. Further, test-related pain showed a stronger association with the BASFI than the performance time (ASPI) itself. In this study, the associations between the ASPI and the BASFI and BASDAI were remarkably higher than previously reported12. It is unclear whether this is caused by cultural differences between Chilean and Dutch patients in, for example, reporting pain, movement-related anxiety, or BASDAI/BASFI scores (generally higher in Latin American patients)23.
Although performance-based tests are considered to be more objective than patient-reported outcomes, the ASPI might be influenced by the observer (e.g., response time, application of instructions, interventions). In 2016, Swinnen, et al attempted to reduce this influence with a sensor (accelerometer)-based performance measure with computer algorithms to identify movement time: the instrumented BASFI (iBASFI)24. In their study, the computerized variables were more reliable than the observer-dependent activities, although the reliability did not differ importantly from the iBASFI-corresponding ASPI activities. Besides, computer-dependent tests require sensors and accompanying software. Taking into account that the ASPI is already highly reliable and requires limited training, simple tools (stopwatch), and little space (consultation room), for now we consider the ASPI a more feasible and accessible choice for clinical studies, especially in countries with limited resources.
An important strength of our study is that it is, to our knowledge, the first to evaluate the ASPI properties in an entirely different population from where the test was developed (Amsterdam, the Netherlands), and included a broader range of patients with axSpA (r− and nr-axSpA, from public and private healthcare). Also, it assessed the interrater reliability for the first time, involving a high number of observers (4).
Our study has a few limitations. First, owing to feasibility aspects, the interrater reliability was assessed in 1 day, which might have induced a learning effect or exercise-induced improvement in mobility after the first ASPI performance. This was supported by the Bland-Altman plots, which showed slightly lower performance times with the second observer. Second, one could debate the optimal interval for test-retest reliability, because for axSpA, fluctuations in disease activity were described for larger intervals and a substantial within-patient variance was reported25. In this study, the allowed interval was set at 1–4 weeks, and most participants were retested within 14 days. Unfortunately, 43% of the patients were unavailable for the retest visit, with a risk of selection bias for the retest reliability. However, the baseline characteristics of patients participating in the follow-up visit did not differ from those who were not available. The fact that the BASDAI was significantly lower at the retest visit probably results from the variance in the BASDAI itself, because it is between the limits described in the scarce literature on this subject26. Last, the Spanish ASPI instructions were developed involving translators from 3 different Latin American countries. However, the test was validated only in Chile and subtle differences in the Spanish language exist between countries. This should be taken into account when applying the ASPI in other Spanish-speaking countries.
Our study in Latin American patients demonstrates, for the first time to our knowledge, that the ASPI test is valid, reliable, and feasible in a non-Dutch population, with both r− and nr-axSpA. In addition, in line with earlier research, our study confirmed a high intrarater reliability, and further, demonstrated for the first time a high interrater reliability. Our study emphasizes that the ASPI test is feasible in clinical practice and can be used in the full axSpA spectrum. Because Spanish is the fourth most spoken language worldwide, the availability of the Spanish ASPI instructions importantly expands its accessibility and usability.
Acknowledgment
The authors acknowledge the ASAS for financial support for this project, the Chilean patient foundation Espondilitis Chile for assistance with patient recruitment, and Jesus Portal (JP), Luis Orlando Benavides Narvaez (LOBN), Maja Haanskorf (MH), and Francien de Groot (FdG) for their participation in the ASPI translation procedure.
Footnotes
This work was supported by the Assessment of SpondyloArthritis international Society (ASAS; ASAS grant 2018/2019).
- Accepted for publication January 21, 2020.