Abstract
Objective. The benefits of early intensive treatment of inflammatory arthritis (IA) are dependent on timely and accurate case identification. In our study, a scoring algorithm for a self-administered IA detection tool was developed and validated for the rheumatology triage clinical setting.
Methods. A total of 143 consecutive consenting adults, newly referred to 2 outpatient rheumatology practices, completed the tool. A scoring algorithm was derived from the best-fit logistic regression model using age, sex, and responses to the 12 tool items as candidate predictors of the rheumatologists' blinded classification of IA. Bootstrapping was used to internally validate and refine the model.
Results. The 30 IA cases were younger than the 113 non-cases (p < 0.0001) and included clinical diagnoses of early IA (n = 10), rheumatoid arthritis (n = 9), and spondyloarthropathies (n = 11). Non-cases included osteoarthritis (n = 46), pain syndromes (n = 19), systemic lupus erythematosus (n = 5), and miscellaneous, noninflammatory musculoskeletal complaints (n = 43). The best-fit model included younger age, male sex, “trouble making a fist,” “morning stiffness,” “ever told you have RA,” and “psoriasis diagnosis.” The overall predictive performance (standard error, SE) of the derivation model was 0.91 (0.03). Internal validation of the derivation model across 200 bootstrap samples resulted in a mean predictive performance (SE) of 0.904 (0.002). The refined tool had a mean predictive performance (SE) of 0.915 (0.002), a sensitivity of 0.855 (0.005), and specificity of 0.873 (0.003).
Conclusion. A simple, self-administered tool was developed and internally validated for the sensitive and specific detection of IA in a rheumatology waiting list sample. The tool may be used to triage IA from rheumatology referrals.
- RHEUMATOID ARTHRITIS
- SELF-REPORT
- SPONDYLOARTHROPATHIES
- EARLY DIAGNOSIS
- QUESTIONNAIRES
- HEALTH SERVICES ACCESSIBILITY
Early treatment of inflammatory arthritis (IA) with antirheumatic therapy increases the probability of disease remission1, decreases disease activity2,3, improves functional outcomes4, and inhibits radiographic progression5. Delays in rheumatologic care impede early, appropriate intervention6,7,8,9,10 and thereby jeopardize patient outcomes. Rheumatology triage, the prioritization of patients referred to rheumatologic care based on potential disease severity, is one way to address barriers to care11,12,13,14.
In a Canadian multicenter cohort of incident rheumatoid arthritis (RA), wait times from primary care referral to rheumatology care accounted for about 25% of the delay in antirheumatic drug treatment6. More than 50% of patients with IA referred to rheumatology have wait times > 1 month6,12. Wait times for rheumatologic care may be similar for IA and non-IA patients12. Other disparities in access to care may exist. Others have found that women face longer wait times15 and that elderly patients with RA experience disproportionate delays to antirheumatic drug treatment16. The implementation of standardized tools to facilitate early referral17,18,19 and expansion of the roles of health professionals to include triage11,20 may assist in overcoming these barriers.
A self-administered tool, encompassing dimensions of Stage 1 case ascertainment of IA, was developed to accelerate access to appropriate care in prerheumatology settings19. The tool was created using a hybrid process comprising a literature review and Delphi panel consensus19. It comprises 12 items with binary, yes/no response options. Age and sex are also collected on the tool. The items identify patient-reported responses to clinical examination and history-taking dimensions. Dimensions of articular pain, swelling, stiffness, duration of symptoms, and diagnostic and family history of disease are recorded. The objective of our study was to develop and validate a scoring algorithm for the self-administered IA detection tool to optimize its discriminative properties in the rheumatology triage clinical setting.
MATERIALS AND METHODS
Study design, setting, and patients
Over 1 year, 143 consecutive consenting patients newly referred to rheumatologic care participated in our study. Participants were recruited from the waiting rooms of the out-patient clinics of 2 academic rheumatologists (VPB and MJB). As part of usual care, patients were provided a rheumatology appointment on a first-come, first-served basis. Referral letters were not prioritized or otherwise screened. Prior to seeing the rheumatologist, patients completed a modified version of the IA detection tool in the waiting room. Blinded to patient responses to the tool, the study rheumatologist subsequently assessed and diagnosed each patient. Eligible patients were at least 18 years of age, had a musculoskeletal complaint, spoke English, read at a Flesch-Kincaid Grade 8 level, and provided informed consent. There were no exclusion criteria.
Measurements
Patients independently completed the IA detection tool (Table 1). The completed tool was submitted to the receptionist and set aside. The participant then received an initial rheumatologic assessment.
As part of the clinical assessment, the rheumatologist recorded a diagnosis in the clinical chart. The rheumatologist's clinical diagnosis was used as the reference standard for IA classification. Clinical diagnoses of ankylosing spondylitis, psoriatic arthritis, reactive arthritis, RA, and undifferentiated IA including undifferentiated spondyloarthropathy (SpA) were classified as positive cases of IA. All other conditions were classified as negative cases. A study coordinator extracted the specific clinical diagnoses for each patient from the clinical charts.
Statistical analysis
Descriptive statistics were used to characterize the study sample. The OR and Fisher's exact test were used to evaluate univariate associations between binary predictor variables and the outcome. Student's t test was used to test univariate disparities in age between outcome classifications.
Development of the scoring algorithm
Logistic regression was used to develop the scoring algorithm for the tool. In the multivariable model, the dependent variable was the rheumatologist's diagnosis of an IA, classified as a binary outcome. Independent variables available for the model included age, sex, and the 12 tool items. Collinearity was formally tested to identify independent variables with a variable inflation factor for logistic regression > 2.521. Demographic variables were included in the model to reduce sample variability. The logistic regression assumption of linearity between continuous independent variables and the log odds of the outcome was tested22. Multivariable-adjusted weights for the predictor variables were determined from their coefficients in the model.
As suggested23, a limited number of models were considered for establishing predictor weights. First, an unweighted model was considered as a base case, in which each independent variable was assumed to be equally relevant, with the same direction of effect in discriminating IA from other musculoskeletal conditions. A full model, including all independent variables, was also investigated24. Three stepwise regression models with selection thresholds of 0.05, 0.20, and 0.50 were also considered. For the weighted models, no assumptions were made on the directionality of the associations between the independent and dependent variables. Minimization of the Akaike information and Schwarz (i.e., Bayesian Information) criteria and a nonsignificant Hosmer-Lemeshow goodness-of-fit test were used to select the best-fit model23,24.
The discriminative performance of the tool was tested using the area under the receiver-operating characteristic curve (ROC AUC). In this plot of sensitivity on the ordinate and (1 – specificity) on the abscissa, an ROC AUC of 0.5 represents chance classification of IA or non-IA and 1.0 represents perfect prediction of the outcome. The ROC AUC was estimated by the c-index statistic25.
Bootstrap resampling was used to internally validate the model with the conventional minimum of 200 samples24. Multivariable-adjusted independent variables selected in the derivation model were carried forward in the bootstrap validation. The mean ROC AUC for the model across the bootstrap samples was compared to that of the derivation model. Bootstrap aggregating (bagging) was used to further refine tool item weights26. These weights were integrated into the logistic regression model equation, which could then be used to predict the classification of future referred patients.
The predictive performance of the refined model was determined across the 200 bootstrap samples. For each bootstrap sample, the threshold that maximized the Youden Index27 was selected as the optimal cutoff. The ROC AUC, sensitivity, specificity, positive and negative likelihood ratios, and the positive and negative predictive values were then averaged over the 200 samples and the uncertainty around the mean estimate propagated28. All statistical analyses were carried out using SAS/STAT version 9.2.
The study was approved by local research ethics boards and conducted in accord with the Declaration of Helsinki.
RESULTS
Patient characteristics
Of the 143 study participants, the rheumatologists diagnosed 30 with an IA and 113 with a non-IA (Table 2). IA cases included diagnoses of early IA, RA, and SpA. Non-IA diagnoses included osteoarthritis, arthralgias and other pain syndromes, systemic lupus erythematus, and a variety of other diagnoses. IA cases were younger than non-IA cases (p < 0.0001). A greater proportion of IA cases responded “yes” to tool items (item 3) hand/wrist swelling (p = 0.007), (item 4) trouble making a fist (p = 0.005), (item 6) morning stiffness > 1 h (p = 0.007), (item 9) ever told you have RA (p < 0.0001), and (item 11) diagnosis of psoriasis (p = 0.005). Differences in sex and other tool items between case classifications were not significant on a univariate basis (Table 3).
Derivation of the scoring algorithm
The stepwise regression using a selection cutoff of p = 0.20 produced the model with best fit (Table 3). Collinearity between the independent variables was not evident. All preselected demographic variables and tool items were considered for the stepwise regression. Age, sex, and tool items (4) trouble making a fist, (5) morning stiffness general, (9) ever told you have RA, and (11) diagnosis of psoriasis were selected in the derivation model. The association between item (5), morning stiffness general, and case classification was not significant.
Model refinement and predictive performance
As determined by the ROC AUC, the predictive performance (standard error, SE) of the derivation model was 0.91 (0.03). Weighting improved the predictive accuracy of the model (p < 0.0001), although the ROC AUC (SE) for the unweighted tool [0.77 (0.05)] was significantly better than chance classification (p < 0.0001). Age, the sole continuous independent variable, was linearly related to the log odds of the outcome, satisfying the logistic regression assumption. The derivation model was internally validated using bootstrapping, resulting in a mean ROC AUC (SE) of 0.904 (0.002).
Bagging refined the precision of the model (Figure 1). Applying bagging, the resulting mean ROC AUC (SE) was 0.915 (0.002). The refined model resulted in a more precise mean Youden Index, 0.728 (0.006), compared to that of the unrefined model, 0.70 (0.08). The sensitivity was also more precisely estimated [0.855 (0.005) compared to 0.80 (0.07)], as was the specificity [0.873 (0.003) compared to 0.90 (0.03)]; Figure 1.
For patients who responded “yes” to item (9), ever told you have RA, the odds of being diagnosed with IA were 24 times greater than for those who responded “no.” Similarly, patients who responded “yes” to item (11), diagnosis of psoriasis, had 6 times greater odds of an IA diagnosis than those who responded “no.” The odds of an IA diagnosis increased by 6.8% over patients 1 year older. Over a 10-year increase in age, the odds of an IA diagnosis were 50.3% greater for younger individuals compared to older ones. Men were 5 times more likely than women to be diagnosed with IA. Patients who responded “yes” to item (4), trouble making a fist, had 5 times the odds of an IA diagnosis compared to those who responded “no.” The odds for an IA diagnosis among patients who responded “yes” to item (5), morning stiffness, were 13 times greater than for those who responded “no,” although this predictor was not significant with this sample size.
The optimal mean cutpoint for the refined model was 0.294 (Table 4). Other cutpoints may be selected depending on specific user performance preferences. To illustrate the relative performance of other thresholds, cutoffs about 0.1 units below and above the Youden Index-optimized value were investigated. Lowering the cutoff to 0.198 had the effect of slightly raising the sensitivity and lowering the other performance properties of the tool. Conversely, raising the threshold by about 0.1 (e.g., 0.400) lowered the sensitivity and negative predictive value and raised the other performance properties of the tool (Table 4).
The relationship between the tool score (from Table 3) and the cutpoint (from Table 4) follows a sigmoidal distribution (Figure 2). A summed score of −0.879 corresponds to the optimized cutpoint of 0.294. The sigmoidal distribution may be used to adjust the tool score threshold according to different desired cutpoints. For example, for a predicted probability of 0.198 (and the corresponding predictive performance reported in Table 4), a tool score cutoff of −1.55 is estimated from the plot. Using the Youden Index-optimized cutpoint, adding 0.879 to the logistic regression equation results in the following formula scaled to set 0 as the threshold:
In this formula, age (AGE) is entered in years, male (MALE) and “yes” responses to tool items 4 (Q04), 5 (Q05), 9 (Q09), and 11 (Q11) are given values of 1, whereas female sex and “no” responses are given values of −1. Here, a positive result predicts an IA case according to the Youden Index-optimized cutpoint. A negative result predicts a non-IA case (Figure 2).
DISCUSSION
Using a cohort of 143 consecutive new referrals to rheumatologic care, the self-administered IA detection tool accurately predicted patients who went on to receive an IA diagnosis from a rheumatologist. Wait-listed patients who were male, of younger age, or self-reported that they were ever told they have RA, ever diagnosed with psoriasis, or had trouble making a fist were more likely to be given an IA diagnosis by a rheumatologist. A sixth independent variable in the model, “Are your joints stiff in the morning?”, was selected but was not significant. A positive result to the scaled formula predicted IA cases in this rheumatology waiting list sample with an overall accuracy (SE) of 0.915 (0.002), sensitivity of 0.855 (0.005), and specificity of 0.873 (0.003). In a separate cohort, the predictive performance of the tool was externally validated, with an acceptable overall accuracy of 0.829 (0.003)29.
The sensitivity and specificity of the IA detection tool is dependent on the predicted probability cutoff selected. As a Stage 1 case ascertainment tool, a high sensitivity is desirable30. In the rheumatology triage setting, there is a demand for a high specificity as well. For this reason, we reported the tool performance using 2 additional cutoffs set at about 0.1 units below and above the determined optimal cutoff. Without a clear precedent to guide cutoff selection, the one that maximized the Youden Index was selected as optimal27. This approach offered the tradeoff of maximizing the sum of sensitivity and specificity. Although other methods of determining the optimal cutoff for ROC curves exist31,32, use of the classic Youden approach appeared reasonable in the context of the study objective without an a priori precedent for a specific sensitivity or specificity.
Despite the favorable performance properties, other logistic regression modeling approaches were available. An alternative modeling approach includes all candidate predictor variables24. This model was considered. The full model included age, sex, and the 12 tool items. Relative to the selected model, the full model resulted in a modest improvement to the performance properties and inferior model-fit statistics. In the context of the small sample size used here, these findings may warrant further investigation into the use of the full 12-item tool in a larger sample. More sophisticated modeling approaches accounting for multilevel associations and other clinical decision-making tools may prove useful in the development of scoring algorithms in the settings of primary care or community screening.
The resulting tool has performance properties comparable to analogous, well-respected tools in rheumatology and other disciplines33,34,35,36. Notably, the van der Helm-van Mil, et al, tool designed to predict the development of RA among patients with undifferentiated IA has an internally validated overall performance (SD) of 0.87 (0.02)33. Although the sensitivity and specificity were not explicitly reported, this overall performance translates to a sensitivity of 0.79 and specificity of 0.77, as estimated from the reported ROC curve. The Visser, et al, tool to discriminate persistent erosive arthritis from early arthritis has an overall performance (SE) of 0.84 (0.02) and estimated sensitivity of 0.75 and specificity 0.7535. These comparisons are made solely to demonstrate effect sizes comparable to those reported here. In the context of the indication for the IA detection tool, its ROC AUC, sensitivity and specificity compare well to the performance of these notable tools.
The current tool was developed to facilitate rheumatology triage of IA. The van der Helm-van Mil, et al, tool has excellent discriminative properties for the selection of patients who fulfill the 1987 American College of Rheumatology classification criteria for RA from those referred with suspected arthritis and rheumatologist-confirmed monoarthritis with < 2 years of symptom duration33,34. The Visser, et al, tool was developed as a diagnostic criteria set to discriminate between patients with self-limiting, persistent nonerosive arthritis and those with persistent erosive arthritis at the first rheumatology visit using the 2-year outcome as the reference standard37. Each of these tools was developed and validated for its unique purpose.
Other tools have been developed for accelerating access to appropriate rheumatologic care17,18. Comparative data on the performance properties of these tools are not available. The triage system developed by Graydon and Thompson used a grading scheme to rank the priority of referrals17. This was not done in our study. Therefore, the patients detected with the current tool may differ from those detected by Graydon and Thompson's tool. Fitzgerald, et al, reported the correlation between rheumatologist and primary care practitioner rankings of the urgency for referral to their Priority Referral Score tool18. This too was not done in the current study. In our study primary care practitioner input was not reported for 21.0% of all cases and 40.0% of IA cases. The early IA detection tool maintained its performance in patients missing a referring diagnosis by the primary care practitioner (PCP). Generally, the differences in development, reporting, and potential differences in patients detected render it inappropriate to make direct performance-based comparisons across these rheumatology triage tools. Comparable with these other tools, however, was the use of the clinician's diagnosis as the clinically relevant outcome (rheumatologist diagnosis in the case of our study). As a self-administered instrument, the IA detection tool offers a unique mode of administration that may complement these other instruments. Where resources are limited, self-administration may offer cost and personnel resource efficiencies.
Our study has a few limitations. As with all validation studies, these results pertain to the setting in which they were tested. Here, the derived scoring algorithm pertains solely to the rheumatology triage setting, where patients benefit from a PCP investigation and referral. In this setting, the tool may be used to address 25% of the delay of DMARD treatment6. The development and validation of scoring algorithms to optimize the tool's discriminative properties in primary care and community settings remain outstanding. A preliminary study to investigate the discriminative properties of the unweighted tool in primary care was recently completed38. The number of events in the study sample was less than the conventional 10 per variable24. To mitigate this limitation, thorough optimization and validation techniques were applied in the modeling exercise. The derivation model was internally validated using bootstrapping. Further, bagging was used to produce more stable predictor coefficients23,26. Bagging is a variance reduction technique especially suited for situations characterized by suboptimal events per variable39,40. It has found application in a number of fields, including genetic mapping41,42. In rheumatology, it has been applied to the development of genetic screening tests for early RA43 and a prediction algorithm for systemic sclerosis44. In a followup study, the optimized model was externally validated29. As an instrument intended for clinical application, the clinically relevant outcome of rheumatologist diagnosis at presentation was used. Rheumatology diagnoses have the potential to change over followup. This was not measured. Given the Delphi method used in the tool's development, tool items relate to the consensus panel's perceived definition of “early IA.” As previously demonstrated, rheumatologic opinion of “early IA” is more closely associated with peripheral inflammatory arthritides than with axial SpA45. Although axial SpA was included in the case definition of IA in this study, the performance of the tool at detecting this more specific subset of peripheral IA warrants further investigation. The use of the IA detection tool in the research setting also warrants exploration. The self-administered Connective Tissue Disease Screening Questionnaire (CSQ), for detecting patients fulfilling classification criteria for specific musculoskeletal conditions, has been used in research settings46,47. The comparative predictive performance of the IA detection tool and the CSQ to detect IA defined by classification algorithms may be of interest.
A scoring algorithm for a self-administered, IA detection tool was validated for the prediction of IA in the rheumatology triage setting. The tool offers excellent performance properties. Self-administered, the tool offers a practical method of triaging patients with IA in the rheumatology waiting list population to accelerate appropriate care.
- Accepted for publication December 18, 2012.