Abstract
Objective. There is an unacceptable delay in the diagnosis of axial spondyloarthritis (axSpA) in its early stages among patients at high risk, in particular those with inflammatory bowel disease (IBD). Our objectives were to develop a sensible and reliable questionnaire to identify undetected axSpA among patients with IBD.
Methods. Literature was reviewed for item generation in the Toronto axSpA Questionnaire on IBD (TASQ-IBD). Sensibility of the questionnaire was assessed among healthcare professionals and patients. This assessment was related to purpose and framework (clinical function, clinical justification, and clinical applicability), face validity, comprehensiveness [oligo-variability (limiting the questionnaire to important items) and transparency], replicability, content validity, and feasibility. The test-retest reliability study was administered to 77 patients with established IBD and axSpA. Kappa agreement coefficients and absolute agreement were calculated for items.
Results. Three domains included IBD, inflammatory back symptoms, and extraaxial features. The entry criterion required a patient to have IBD and back pain or stiffness that ever persisted for ≥ 3 months. Iterative sensibility assessment involved 16 items and a diagram of the back. Kappa coefficients ranged from 0.81–1.00 for each item. Absolute agreement across all items ranged from 91% to 100%.
Conclusion. TASQ-IBD is a newly developed, sensible, and reliable case-finding questionnaire to be administered to patients with IBD who have ever had chronic back pain or stiffness persisting for ≥ 3 months. It should facilitate identification and timely referral of patients with IBD to rheumatologists and minimize the delay in diagnosis of axSpA. Consequently, it should assess the prevalence of axSpA in IBD.
Ankylosing spondylitis (AS) is considered the prototype of spondyloarthritis (SpA). The newly coined term “axial SpA (axSpA)” comprises predominantly axial symptoms, either without radiographic sacroiliitis (nonradiographic axSpA) or with radiographic sacroiliitis (i.e., AS). The diagnosis of AS is usually made 8 to 11 years after the onset of symptoms, mainly owing to poor recognition of the disease in its early stages1. In fact, definitive radiographic sacroiliitis may not appear until about 8 years after symptom onset, which limits the usefulness of the modified New York criteria in classifying early disease1,2. The Assessment of SpondyloArthritis International Society (ASAS) has developed and validated new classification criteria for axSpA to address this3.
Patients with inflammatory bowel disease (IBD) are at high risk of developing axial symptoms of inflammatory back pain (IBP). The prevalence of IBD-associated AS (i.e., AS diagnosed in patients with IBD) using the modified New York criteria is 4% to 10%4. Symptoms of axSpA can precede IBD symptoms in 31% to 50% of patients, while symptoms of IBD and axSpA can occur simultaneously in 15% to 40% of patients5,6,7,8. The literature does not support that patients with IBD have a different presentation of AS. As in primary AS, IBD-associated AS has the same age of onset, delay in diagnosis, work disability, reduced quality of life, and disease activity1,5,9,10.
Because the predominant feature of axSpA is IBP, attempts have been made to refine IBP features. Three English self-reported questionnaires for identifying IBP exist. The first was developed by Calin, et al11. It comprises 5 items for IBP (excluding neck pain), which were validated in patients with AS and mechanical or nonspecific back pain. Although the sensitivity of fulfilling 4 out of 5 criteria was 95% and the specificity was 85%, the diagnostic performance was found to be lower in subsequent validation studies (sensitivity 23% to 38% and specificity 75%)12. The second questionnaire was a 12-item case ascertainment questionnaire (CAQ) that was developed and validated to identify patients with radiographic AS13. The sensitivity and specificity of the CAQ in the validation sample was about 67% and 95%, respectively. However, it did not target patients with early axSpA because the mean disease duration among patients with AS was 21.8 years. It also did not address patients with IBD-associated AS. A third screening questionnaire for IBP consisted of 6 items14. The questionnaire was administered to patients with established AS and mechanical back pain. In multivariate logistic regression analysis, diurnal variation had the strongest association with IBP, with a specificity of 92% but a sensitivity of 49%. The main focus of these questionnaires was to refine the definition of IBP for earlier identification of AS, but this was found to increase posttest probability from 5% to 14%15. Further, the interpretation of IBP requires clinical experience. Therefore, IBP is problematic if used as the sole referral criterion. The recommendation is to combine IBP with other features of axSpA to achieve a high posttest probability of axSpA15.
Attention needs to be directed toward those patients at heightened risk of developing axSpA, as is the case in patients with IBD. One approach to identify those IBD patients with undetected axSpA is to screen them when they present to their gastroenterologists for followup of their IBD. Such screening is defined as case finding. A good case-finding questionnaire requires items that are accurate, highly sensitive, reasonably specific, and that have good predictive value16. Further, in this modern era, there is an expectation that instruments should be evaluated for their measurement properties. The clinimetric method supports the use of clinical judgment (by physicians and/or patients) during development of an instrument measuring a complex clinical phenomenon such as axSpA in patients with IBD. Important facets of the clinimetric approach include sensibility (usefulness of an instrument), reliability, validity, and responsiveness. The aim of our study was to develop a simple, sensible, and reliable self-reported questionnaire that allows earlier case finding of axSpA in patients with IBD.
MATERIALS AND METHODS
Design outline
Figure 1 illustrates the phases of development of the Toronto axSpA questionnaire on IBD (TASQ-IBD).
Item generation
A thorough literature review was done using Medline (from 1941 to April 2011), PubMed, and EMBASE (inception to April 2011) to identify potential items for the development of the questionnaire. Papers were restricted to the English language. Keywords included various combinations of inflammatory back pain, back pain, ankylosing spondylitis, spondyloarthritis, spondyloarthropathy, symptoms, features, manifestations, risk factors, questionnaire, screening, case finding, and diagnosis. Articles were hand-searched to identify additional relevant articles.
Cardinal articles in this field (Hart, et al17; Calin, et al11; Moller, et al18; Gran19; and Rudwaleit, et al12,15) and previously developed classification criteria for AS and SpA (Rome criteria for AS, 1961; New York criteria for AS, 1968; modified New York criteria, 1984; Mau criteria for identifying early AS, 1985; Amor criteria for SpA, 1990; European Society of Spondyloarthropathy Group criteria for SpA, 1991; and ASAS classification criteria for early axial and peripheral SpA, 2009, 2010)3,20,21,22,23,24 were also reviewed with a focus on clinical history, particularly features of IBP.
Response options and instrument format
The majority of items asked closed-ended questions with binary responses, with 2 questions providing continuous data. Some branching questions (subquestions) were included. Three major headings were determined a priori, representing the domains of the questionnaire: IBD, inflammatory back symptoms, and extraaxial features.
Item reduction using sensibility assessment
Sensibility assessment refers to the usefulness of an instrument. It includes a statement of purpose and framework (clinical function, clinical justification, and clinical applicability), overt format (comprehensibility and replicability), face and content validity, and feasibility (Table 1)25. The sensibility of TASQ-IBD was evaluated using a sensibility instrument that was adapted from Rowe and Oxman26. This instrument has been successfully used in the development of other recent instruments, such as the Pediatric Cardiopulmonary Physiotherapy Discharge Tool27 and the Human Immunodeficiency Virus Disability Questionnaire28. The sensibility of TASQ-IBD was evaluated by a purposive sample of academic healthcare professionals and researchers involved with SpA. This sample comprised 2 rheumatologists with expertise in SpA, an SpA physiotherapist, a clinical trial manager, 2 SpA nurses, an SpA research analyst, and 2 general rheumatologists.
Pilot study
This phase aimed at testing the questionnaire among a convenience sample (n = 19) of patients with SpA attending the Spondylitis Clinic at the Toronto Western Hospital. This was done to evaluate the sensibility and qualitative analysis from patients’ perspective. The clinic is one of the largest academic centers for the care of patients with axSpA in Canada. All patients, ≥ 18 years old and fluent in English, gave their informed consent29.
Patient characteristics were collected including age, sex, the level of education (below grade 8, high school incomplete, high school graduate, college, university), the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), and the Bath Ankylosing Spondylitis Functional Index (BASFI) scores of the last clinic visit30,31. The sensibility assessment instrument was administered to patients and ended with a free text comment area. Also, patients were interviewed by an investigator (KAA) to ascertain their comments on the TASQ-IBD.
Repeat sensibility assessment
Revisions were made to the TASQ-IBD based on feedback from healthcare professionals and patients. The sensibility of TASQ-IBD was then reassessed by the healthcare professionals.
Coding the responses on the questionnaire
Dichotomous response options were coded as 1 for Yes and 0 for No. Question 4 and subquestion 6, which contained continuous variables, were converted into dichotomous variables. The first question asks about the age of onset of back pain, which was dichotomized into age ≤ 45 years and age > 45 years in accordance with the ASAS classification criteria for axSpA. The second question asks about duration of morning stiffness, which was dichotomized into ≥ 30 min and < 30 min. Subquestion 7 and questions 11 and 12 have 3 response options: Yes, and 2 negative detailed response options. The negative options were collapsed into 1 category (No) for the purpose of analysis.
Reliability assessment
Reliability evaluates the reproducibility of an instrument that measures an attribute or a construct. Test-retest (intrarater) reliability assesses the stability of the item scores on different occasions to the population of interest. Test-retest reliability was evaluated within a 2-week period between administration times to reduce the risk of recall bias and intercurrent flare of axSpA32.
Sampling method
As of July 1, 2011, the database of the Spondylitis Clinic had 636 patients with axSpA. Of those patients, 77 (12.1%) had concurrent IBD that was diagnosed by gastroenterologists based on clinical, endoscopic, and histological evaluations. All patients had chronic back pain or stiffness that persisted for ≥ 3 months.
TASQ-IBD was mailed to all 77 patients. The tailored design method of Dillman was adapted to maximize the response rate33. Each patient was mailed 2 copies of the TASQ-IBD questionnaire accompanied by a personalized covering letter, which explained the purpose of the reliability study and provided clear instructions about the completion of each questionnaire on 2 occasions 1 to 2 weeks apart. Patients were blinded to knowing that their responses would be compared among them to minimize the possibility of Hawthorne effect, which refers to responders modifying their responses because of awareness that they were being studied34. This information was collected from the database: descriptions of the characteristics between responders and nonresponders, demographics, the highest level of education at the last clinic visit, type of IBD, age of onset of IBP, duration of axSpA disease, BASDAI and BASFI scores of the last clinic visit, erythrocyte sedimentation rate (ESR), and C-reactive protein (CRP) levels at the last clinic visit. Nunnally and Bernstein have suggested a reliability coefficient (R) of 0.70 for a measuring instrument for research purposes, and 0.90 for clinical purposes35. A minimally desired reliability of 0.80 was chosen. Therefore, the calculated sample size with hypothesized R = 0.90, standard error 0.05, and 2 observations for each patient is 24 patients36.
Statistical analyses
Descriptive statistics were used for the data. Appropriate parametric and nonparametric tests were used to compare the variables between responders and nonresponders. For test-retest reliability, Cohen’s kappa (κ) statistics were determined and were interpreted using the recommendations of Landis and Koch as follows: κ < 0.00 = poor agreement, κ between 0.00–0.20 = slight agreement, κ 0.21–0.40 = fair agreement, κ 0.41–0.60 = moderate agreement, κ 0.61–0.80 = substantial agreement, and κ > 0.80 = almost perfect agreement37. The percentage of agreement for the items was also determined. Statistical analyses were done using SAS (version 9.3) and R (version 2.14.2; The R Foundation for Statistical Computing). Statistical analyses were 2-sided, and the statistical significance was defined by p < 0.05.
The institutional review ethical board approved our study.
RESULTS
Item generation
Ninety-three potential items were generated. Sensitive and/or specific items were chosen in keeping with the properties of a case-finding instrument. Pictures of peripheral arthritis of hand joints, dactylitis of a finger and a toe, and acute anterior uveitis were included along with a diagram of the back.
Item reduction
An entry criterion was added to reduce referrals for mechanical back pain. To qualify for the questionnaire, a patient must have IBD and back pain or stiffness that ever persisted for ≥ 3 months. Iterative sensibility assessment of TASQ-IBD resulted in selection of 18 items with some modifications for shorter and clearer sentences (Table 2). This version included a diagram of the back and one picture of toe dactylitis.
Pilot study
In all the 3 pilot stages (n = 4 patients in stage I, n = 9 in stage II, and n = 6 in stage III), the total number of patients was 19 with a median age of 36 years (range 24–61 yrs) and 68% were men. The median BASDAI score was 3.0 (range 0–8.8), and the median BASFI score was 5.0 (range 0–7.7). Forty-two percent of patients were high school graduates, 42% were university graduates, and 16% were college graduates. Table 3 summarizes the sensibility assessment among patients with axSpA. During each successive stage, patients who participated in the previous stage did not participate in the next one.
Throughout all 3 pilot stages, items underwent either further modifications or removal if they were still unclear. For example, the item asking about the mode of onset of back pain underwent iterative remedy (modification of stem question and response options) until the final version was reached. Some patients suggested the addition of response options to make the questions more practical. For instance, they suggested adding a third response option (I do not get up) for the item asking about improvement of nocturnal back pain when getting up and moving. The questions about awakening because of back pain during the second half of the night and alternating buttock pain were removed. Despite their importance as characteristics of IBP, patients found them difficult to understand. Some patients suggested adding “hip pain” to the diagram of the back although they drew a line pointing toward the sacroiliac joints. This item was dropped during item selection because a proper definition was lacking. These modifications led to a revised questionnaire with 16 items and 1 diagram. We retained a subquestion on prior diagnosis of AS to confirm the diagnosis in patients during the reliability study.
Repeat sensibility assessment
There was a significant improvement in the sensibility of the final questionnaire compared to the initial sensibility assessment. This time, all healthcare professionals but 1 agreed on the simplicity of the scoring system assuming that each question was weighted equally by a score of 1 (Table 2).
Reliability study
Of the 77 mailed questionnaires, 34 were returned by the cutoff date. Six questionnaires were returned owing to incorrect addresses. The response rate during the summer of 2011 was 44.2%. Questionnaires from 5 out of 34 patients had ≥ 15% missing answers. Because calculating κ statistics using our small sample size requires each item to be answered, we contacted 4 of the nonresponders within a week of receiving their returned envelopes. The unanswered questions were read for them over the phone as written without providing any further explanation to avoid interviewer bias. The overall percentage of missing items was 16.6%. The most common items not completed by the 5 responders were the type of IBD, time to develop back pain or stiffness, responsiveness of nonsteroidal antiinflammatory drugs, family history of AS, and history of peripheral arthritis. Data from 33 out of 34 patients were usable for the descriptive analysis, and 34 pairs of questionnaires were usable for test-retest reliability. We could not identify 1 male patient because of a missing date of birth on both questionnaires and inability to crosscheck the code on his questionnaire with the spondylitis database (clerical error), but we used his questionnaires for test-retest reliability. Because we required a sample size of 24 to assess test-retest reliability, the final number of (n = 33) responders was sufficient.
Comparison between responders and nonresponders
There was no statistically significant difference between the groups in terms of age, sex, the highest level of education, types of IBD, age of onset of back pain, duration and activity of axSpA, functional activities, ESR, or CRP levels (Appendix 1).
All responders confirmed at test and retest times that they had back pain or stiffness that persisted for ≥ 3 months and were diagnosed with AS.
At this stage, the healthcare professionals agreed on removing a response option “Prior diagnosis of psoriatic arthritis” from item 13 for 2 reasons. First, this item was only endorsed by 1 patient. Second, during the validation phase of the CAQ, this item did not differentiate AS from chronic back pain13. We considered psoriasis adequate for this questionnaire. Further, psoriasis was recently shown to be more associated with nonradiographic axSpA (OR 3.6) compared to classic AS38. The final version of TASQ-IBD (dated May 2012) is shown in Figure 2.
Test-retest reliability
The κ coefficients of each item ranged between 0.81 and 1.00, which indicates almost perfect agreement for test-retest reliability (Table 4). The absolute percentage of agreement across all items ranged from 91% to 100%.
The range of κ coefficients for each domain was as follows: 0.84–1.00 for the IBD, 0.85–1.00 for inflammatory back symptoms, and 0.81–1.00 for extraaxial features. The ranges of absolute agreement between different domains were as follows: 91% to 100% for the IBD, 94% to 100% for inflammatory back symptoms, and 91% to 100% for extraaxial features.
DISCUSSION
The self-administered TASQ-IBD was designed to serve as a case-finding instrument that can facilitate referrals of IBD patients with suspected axSpA to rheumatologists for further evaluation. Early diagnosis allows for earlier intervention, an important concept because tumor necrosis factor inhibitors have been found to be efficacious in early disease39. Previous questionnaires focused on refining the characteristics of IBP and did not specifically target IBD patients or use an illustration. TASQ-IBD is inexpensive and easy to administer at a single point in time to patients with IBD who have ever had chronic back pain or stiffness that has persisted for ≥ 3 months. Completion of the questionnaire takes ≤ 5 min, which is in accordance with the recommended completion time of between 5 and 15 min40.
In the design of our questionnaire, we adopted multiple steps to ensure an appropriate methodology and to yield a sensible and reliable instrument. The items on TASQ-IBD represent discriminatory features that increase the likelihood of axSpA.
Psychometric scales start with a large pool of candidate items, which was used in the development of TASQ-IBD. Psychometrics relies heavily on statistical analyses in item reduction with less detailed assessment of face and content validity41. A combination of psychometric and clinimetric approaches was used in the development of TASQ-IBD. The sensibility assessment proved to be a valuable and comprehensive approach during item selection and pilot testing that enabled an analysis of different facets of the questionnaire and addressed any weaknesses. The clinimetric approach has been used in the literature to create instruments, such as the New York Heart Association Functional Classification and the Pittsburgh Sleep Quality Index25,42.
Assessing internal consistency is of less relevance in clinimetric measures because the aim of developing a measuring instrument is to identify different aspects (e.g., characteristics of IBP and presence of extraaxial features) of a complex construct (axSpA in IBD). Therefore, the internal consistency is likely to be low, although this is not always the case43. This concept has been adopted in developing quality criteria for measurement properties of health questionnaires44. Most psychometricians, on the other hand, tend to focus on constructing unidimensional scales with homogenous items, as this is one of the axioms of classic test theory. However, psychometric indices do not always require internal consistency45. This was probably why the most recent Guidelines for Reporting Reliability and Agreement Studies (GRRAS) did not endorse assessment of internal consistency when constructing new measuring instruments34. Based on this reasoning, we did not report the internal consistency of TASQ-IBD.
We provided detailed information on the reliability and agreement using coefficients of reliability (along with statistical uncertainty) and percentage agreement as per the recent recommendations of the GRRAS. The degree of impression for some items may be due to the small sample size.
Few studies have compared psychometric and clinimetric methods during the development of 2 similar versions of health-related scales. For example, during item reduction for the development of the Quality of Life After Myocardial Infarction and the QuickDASH, the clinimetric method slightly outperformed the psychometric method when both versions of each questionnaire were tested for concurrent validity and responsiveness46,47. Further, during the development of a health measurement scale, it was found that using the clinimetric method for constructing a heterogeneous scale measuring a complex phenomenon satisfied the criteria of psychometric method for constructing a homogeneous scale. The conclusion was that these measurement techniques complemented each other43.
Constructing a clear questionnaire proved to be a challenge. Patients may not necessarily understand what healthcare workers perceive as a straightforward question, because patients come from diverse linguistic and cultural backgrounds. Some items were dropped because they were unclear to patients or were susceptible to misinterpretation, even though they were relevant to axSpA. Examples include sleep disturbance at the second half of the night, alternating buttock pain, and hip pain or stiffness. A strength of the TASQ-IBD compared to previous questionnaires is that it includes a diagram of the back to allow patients to identify the location(s) of their back pain or stiffness. Illustrations were incorporated in recent screening questionnaires, such as the Toronto Psoriatic Arthritis Screening (ToPAS)48, which was found to perform slightly better than the Psoriatic Arthritis Screening and Evaluation questionnaire partly because ToPAS used illustrations49.
Assigning appropriate weighting to each item of the questionnaire is another challenge. There are many methods for scoring questionnaire items based on clinical judgment or statistical methods. Scoring of the TASQ-IBD will be explored in the validation phase, ideally when TASQ-IBD is administered to patients with IBD presenting to gastroenterology clinics. If they have chronic back pain persisting for ≥ 3 months, they will be referred to a rheumatology clinic for further assessment. Then they will be divided into those who have a diagnosis of axSpA and those who do not. A cutoff score is to be explored in light of an appropriate sensitivity and specificity.
While this study achieved its primary goals, there are some limitations. The pilot study was conducted at the Spondylitis Clinic in a tertiary center using a convenience sample of patients with axSpA. Ideally, a sample of the target patients should be used, i.e., patients with IBD and axSpA. Although our sample size was small, all of the patients in our study represented different levels of disease severity and levels of education with a wide spectrum of axSpA disease activity. The questionnaire is limited to patients who have completed a fifth-grade education. It could potentially be too sophisticated for patients who have lower educational levels. One solution would be to use more illustrations. This questionnaire is only applicable to English-speaking and literate patients. Therefore, future research is needed for cross-cultural adaptation and validation to different languages.
The TASQ-IBD is a newly developed, self-reported questionnaire to be administered to patients with IBD who have ever had chronic back pain or stiffness that has persisted for ≥ 3 months. We used the principles of measurement science to produce a sensible and reliable CAQ that should facilitate early patient referral to rheumatologists and avoid delay in diagnosis of axSpA. Consequently, TASQ-IBD should be useful in assessing the prevalence of axSpA in IBD.
APPENDIX 1.
Footnotes
-
Dr. Alnaqbi’s work was supported by a scholarship funded by Abu Dhabi Education Council. Dr. Johnson is supported by a Canadian Institutes of Health Research Clinician Scientist Award.
- Accepted for publication June 11, 2013.