Abstract
Objective. To conduct a systematic review and quality appraisal of quality measures for inflammatory arthritis, including rheumatoid arthritis (RA), spondyloarthritis, psoriatic arthritis (PsA), and juvenile idiopathic arthritis (JIA).
Methods. Embase, MEDLINE, and Cumulative Index to Nursing and Allied Health Literature (CINAHL) were searched from January 1, 2000, to October 23, 2016, using Medical Subject Headings terms for inflammatory arthritis and quality measures. A “grey literature” search of international arthritis organizations and quality measure libraries was also conducted. Two reviewers independently considered the papers for inclusion, with disagreements resolved by consensus. A modified guideline appraisal tool (AGREE II) was used to appraise the measure development process, which determined final inclusion. Measures were abstracted in duplicate and categorized into themes, measure type, and domains of quality.
Results. Thirteen measurement sets were included from 4 countries (United States, Canada, United Kingdom, Netherlands) and 1 European consortium. They included 10 sets on RA and 1 each for PsA, inflammatory arthritis, and JIA. There were 161 unique individual measures (136 process, 20 structure, and 5 outcome). Major themes included assessment, medications, and comorbidities. Measure development methods were varied, including RAND/University of California, Los Angeles appropriateness methodology, prioritization exercises, or other modified-Delphi methods. Inclusion of patients occurred in 77% of development groups. Discussion of barriers to measurement was infrequent.
Conclusion. Inflammatory arthritis quality measures cover a diversity of themes encompassing process, structure, and outcomes of care across the 6 domains of quality. However, between organizations, measure development is not standardized. Local assessment of measurement feasibility before use outside the original development context is recommended.
- RHEUMATOID ARTHRITIS
- JUVENILE IDIOPATHIC ARTHRITIS
- PSORIATIC ARTHRITIS
Quality measures are tools for measuring whether care provided is concordant with evidence-based practices. They often represent minimum standards of care and are derived from quality indicators, which are statements about best practices that are associated with high-quality care. Quality indicators are usually specified in the following format: If (a specific clinical scenario), then (a clinical action)1. An often-cited example of a quality indicator would be if a patient has rheumatoid arthritis (RA), then they should be prescribed a disease-modifying antirheumatic drug (DMARD)1. Quality indicators can be further specified into quality measures, which have a specific numerator, denominator, and exclusions and can be reported as a percentage representing a quality or performance measure (depending on whether the subject of measurement is associated with improved quality or just healthcare performance). Quality measures can be used for benchmarking and quality improvement efforts, and in some countries and jurisdictions, are used in pay-for-performance programs. An example of a quality measure based on the above quality indicator would be “the percentage of patients with RA who have been prescribed a DMARD.”
According to a classic Donabedian framework2, quality measures are often classified into process, structure, or outcome. Process measures determine whether clinical processes are concordant with evidence-based best practice and improved patient outcomes; structure measures record whether the health system or clinic infrastructures are present to support best practices; and outcome measures determine the effect on the health status of the patient. While outcome measures may seem the most obvious to measure, patient outcomes are the result of many factors, some of which are beyond the control of physicians. Several domains of quality exist, and the most commonly cited framework is the Institute of Medicine’s 6 Domains of Health Care Quality3: safety, effectiveness, patient-centeredness, timeliness, efficiency, and equity.
The present systematic review of quality measures in rheumatology was undertaken to review the landscape of existing quality measures for inflammatory arthritis and to classify existing measures according to the Donabedian and Institute of Medicine frameworks described above. The quality of the existing measurement sets was appraised and the methods of development were also reviewed.
MATERIALS AND METHODS
The systematic review was developed and reported according to Preferred Reporting Items for Systematic Reviews and MetaAnalyses (PRISMA) guidelines4. The search strategy was developed in consultation with a medical librarian using quality measures as well as Medical Subject Headings terms for inflammatory arthritis including rheumatoid arthritis (RA), juvenile idiopathic arthritis (JIA), and the spondyloarthropathies (SpA): ankylosing spondylitis (AS), psoriatic arthritis (PsA), reactive arthritis (ReA), and inflammatory bowel disease (IBD)-related arthritis (see Supplementary Figure 1 for the search strategy, available with the online version of this article). The study protocol was registered with the International Prospective Register of Systematic Reviews (PROSPERO, http://www.crd.york.ac.uk/PROSPERO, record number 33433). EMBASE, MEDLINE, and CINAHL were searched from January 1, 2000, to October 23, 2016. A “grey literature” review of Websites for international arthritis organizations and quality measure libraries was also conducted (see list of reviewed Websites in Supplementary Table 1, available with the online version of this article).
Two reviewers independently considered the papers for inclusion with disagreements resolved by consensus. Quality measures were included if they were in English and identified an element of inflammatory arthritis care, and the method of development of the quality measures was available. Checklists or quality measures with no description of how they were selected and/or developed were excluded, as were quality standards (they lacked specific numerators and denominators and were not measurable). If the development of the quality measure was not outlined on a potentially eligible measure set, attempts were made to contact the author to determine whether the development strategy was available. Studies describing the use of quality measures, and not the development of the measures, were excluded.
A modified guideline appraisal tool, the Appraisal of Guidelines for Research and Evaluation II (AGREE II), that has previously been used for quality measure development evaluation5, was used to evaluate the measure development process for each set. The modified AGREE II tool includes 6 domains of measure development that are rated on a 7-point Likert scale, where 1 is “strongly disagree” and 7 is “strongly agree” that the domain item was met by the measurement set. The 6 domains include scope and purpose, user involvement, rigor of development, clarity of presentation, applicability, editorial independence, and an overall quality assessment, with the latter determining final inclusion. Description of how these elements were rated is presented in the published modification5. For example, “applicability” was rated based on whether the measures were supported with tools for use (e.g., data collection forms, electronic databases, etc.) and whether potential barriers to applying the indicators have been discussed. Two of 3 reviewers (CEB, MC, AR) independently scored the measurement sets. Because one of the authors of this review (CEB) had been an author on some of the included publications, she did not participate in the quality review of her own publications. Domain scores were then calculated for each of the 6 domains according to the AGREE II scoring (example available from www.agreetrust.org). In brief, the scores for each element in the domain are summed across reviewers and the minimum possible score for each domain is subtracted from this obtained score, then the result is divided by the maximum possible score minus the minimum possible score for each domain. A maximum domain score of 100% indicates that all reviewers felt that all elements of the domain were met and rated them each a 7.
Individual measures from the included sets were abstracted in duplicate and categorized by indicator type (process, structure, outcome)2 and were also based on the 6 domains of quality (safety, effectiveness, patient-centeredness, timeliness, efficiency, equity) from the Institute of Medicine3. While quality measures can fall in multiple domains of quality, for this paper we elected to assign each measure to a single domain, deemed by the 2 reviewers to be the dominant domain of quality represented by the measure.
Major and minor measurement themes were determined by 2 of the reviewers (CB and MC), and each measure was categorized with any disagreements resolved by consensus. The method of measure development was abstracted including the measurement team composition.
Ethics approval for this study was not required as per the guidelines of our institution.
RESULTS
There were 4418 citations identified and 456 were included in full text review. After quality rating, a total of 13 were included (Figure 1). From the 13 sources (Table 1), a total of 161 individual quality measures were abstracted (See Supplementary Table 2, available with the online version of this article). Seven of the sets were from North America6–12 and 6 were from Europe (3 from the United Kingdom13,14,15, 2 from the Netherlands16,17, and 1 was a pan-European set18). Most of the sets were for RA6,7,9,11,12,13,15–18 (n = 10, 77%) with 1 each (8%) for JIA10, PsA14, and inflammatory arthritis8. No sets were found for ReA and none accepted for AS. The vast majority of the individual 161 measures were specific to RA (n = 143, 89%), while 12 addressed JIA (7%), 4 inflammatory arthritis (in general, 2%), and 2 PsA (1%).
Quality assessment of measurement sets
Details of the quality assessment for individual measurement sets are shown in Table 2. In general, well-described domains included the scope and purpose of the measurement sets (mean domain rating across sets 95%) and description of participant involvement in the development process (mean domain rating across sets 85%). The domain relating to the rigor of development received a mean rating across sets of 64%, while clarity of presentation received a mean rating across sets of 73%. Conversely, applicability of the measurement sets was generally poor (mean rating across sets of 34%), highlighting a lack of discussion of the organizational barriers to measurement or tools to support measure use. The domain for editorial independence had a mean rating of 50%, often because the measurement set did not declare or address potential competing interests of the measure development panel.
Quality measure themes
There were 18 major themes abstracted from the quality measures, and subthemes were assigned these major themes (Table 3). The most common major theme, constituting 27% of measures from all sets, was medications (including subthemes of DMARD use, steroid use, and medication counseling, and safety in pregnancy counseling). The second most common major theme was clinical assessment, constituting 26% of all measures (including subthemes of functional status, disease activity, radiographs, and followup). The remainder of the major themes each composed < 10% of all measures, including accessibility, adjunctive therapy (with subthemes of exercise, physiotherapy, and assistive devices), comorbidity assessment/management, diagnosis (of the inflammatory arthritis), documentation, education (patient self-management), experience (healthcare and patient satisfaction), clinical expertise (nurse and physician), multidisciplinary care, outcomes, surgery, triage, vaccinations, and waiting times.
The measurement themes were also categorized based on disease (RA, PsA, and JIA, data not shown). Most of the JIA measures related to the assessment theme (58%), medications (25%), and the remainder to comorbidity (ophthalmologic), experience (patient satisfaction), and wait time themes. Both PsA measures addressed the assessment major theme and the followup subtheme.
When the measures were categorized using the Institute of Medicine3 domains of quality (Table 4), 35% related to measures of healthcare effectiveness, 25% to safety, and 16% to patient-centered care. In contrast, only 11% addressed timeliness of patient care, 12% efficiency, and 1% directly measured care equity. When the measures were categorized using a Donabedian framework2, according to process, structures, and outcomes, the majority related to process of care (84%), with only 20 being structure (12%), and 5 (3%) recording patient outcomes (Table 1).
Quality measure development methods
Of the 13 sets, 6 were developed predominantly from an existing guideline or clinical care pathway, 3 were based on systematic reviews of existing guidelines and measures, 3 were based on nonsystematic literature reviews, and 1 was based on discussions/questionnaires with those involved, accompanied by a nonsystematic literature review10 (Table 5). Further methods are described in Supplementary Table 3, available with the online version of this article.
Similarly, there was variability in the consensus procedures used to finalize the measurement sets: a modified Delphi in conjunction with Nominal Group Technique in 1 set10, a modified RAND/University of California, Los Angeles (UCLA) Appropriateness Method in 4 sets11,12,16,17, an online modified Delphi using a platform called ExpertLens (RAND Corp.) in 3 sets6,7,8, a modified Delphi in another17, and other study-specific consensus methods in 4 sets9,13,14,15.
Three measure development panels included physician experts exclusively9,11,16, while the other 10 included a diverse set of medical professionals such as nurses, occupational therapists, and physiotherapists. All 10 also included patients in the development process (Table 5). The measure development panels universally included rheumatologists.
DISCUSSION
To our knowledge, this is the first systematic review of quality measures including all types of inflammatory arthritis that has rigorously assessed the quality of the methods of development and reporting of quality measures using a published modification of the AGREE II5. The review has identified 13 high-quality sets of quality measures in inflammatory arthritis, with 161 individual quality measures. The majority of sets included were developed for RA (n = 10), while only a few sets for inflammatory arthritis, PsA, and JIA were identified. Although 1 set was identified for AS19, it was not included because development was unclear, indicators were not measurable, and they were more consistent with quality standards because they lacked clear numerators and denominators. There were no sets identified pertaining to other SpA subtypes including ReA or IBD-related arthritis.
Methodologic review of the included sets revealed substantial heterogeneity in the methods used for development, including the evidence base, the consensus technique used, and the panel composition. For example, considering the evidence base used for the measure development, only 3 sets of quality measures were based on systematic reviews of existing guidelines and measures7,16,18, with another 3 based on nonsystematic literature reviews6,8,11. The rest of the measures were developed based on discussions and/or questionnaires with those involved or on existing guidelines. Similarly, the methods for consensus varied among development groups. While most groups used a modified-Delphi method or a RAND/UCLA Appropriateness Method20, others used nonspecified consensus techniques to develop their quality measures. More recently, the RAND/UCLA method has been adapted for online panels using a platform called ExpertLens21. This platform was used for 3 panels6,7,8 and has the advantage of including a larger number of participants (40–50) than traditional consensus panels20, allowing for broader input. While there is no current standard methodology for quality measure development, at a minimum it is recommended to develop measures based on a high level of evidence (usually from a guideline). Measure development groups should also use transparent reporting of the consensus procedures used for measure development and ensure broad input, including patients.
For quality measures to be useful, they must reflect a number of important attributes, including face/content validity, reproducibility, acceptability, feasibility, reliability, sensitivity to change, and predictive validity22. Quality measures must also target important improvements and be precisely defined and specified23. Some of these attributes are important to establish during the development stage (e.g., face/content validity, importance, and acceptability), while others must be defined during testing of the measures (e.g., reproducibility, feasibility, reliability, etc.). Interestingly, only a few of these attributes are routinely reported in the studies describing the measure development of the inflammatory arthritis measure sets in this review. For example, to be accepted, quality measures needed variably to be valid11, valid and feasible7,12, valid and important6, valid and relevant8, relevant17, and applicable and feasible18. A few sets did not follow any published attributes for measure development and used the following statements for consensus development: “adequately reflected the quality of disease course monitoring”16, “consistent with [previous] recommendations”9, or simply included measures with higher “priority” rankings10,13,14,24. Measure developers seeking to adopt and/or develop new measures should be encouraged to assess at a minimum the validity and feasibility of the candidate measures (the latter by testing the measure in clinical practice). Tools for measure assessment in key domains have been published by Stelfox and Straus23 and may be useful for measure developers, allowing them to assess candidate measures along the domains of measure importance, clarity, reliability, validity, and feasibility. In the United States, the National Quality Forum (NQF) is a not-for-profit organization that works to endorse quality measures used for federal reporting as well as other programs, including pay-for-performance programs. Measures considered for endorsement by NQF undergo a rigorous process and 4 principal criteria are used: importance, scientific acceptability, usability and relevance for intended uses, and feasibility25. It was unfortunately beyond the scope of our review to individually appraise each quality measure according to these criteria. We encourage users to evaluate individual metrics presented herein based on these criteria prior to implementation or use, because it is possible the measurement set fulfilled modified AGREE II criteria but that an individual measure has poor validity or feasibility in a particular healthcare context.
Published measures for inflammatory arthritis care were almost exclusively process and structure measures, with only 3% (n = 5) found to be outcome measures, all of which came out of Europe16,17,18. While Rademakers, et al26 found that structure and process most determined patients’ assessment of healthcare, there is limited evidence that measuring the process and structure of care improves patient outcomes. A study investigated the documentation of disease activity and functional status by 18 rheumatologists and found that documentation was infrequent; however, this appeared to have no association with radiographic outcomes over 24 months27. As patient outcomes are of primary importance to healthcare providers, health systems, and patients, it is likely that in the future, increasing emphasis will be on measuring outcomes28. While it is tempting to suggest potential outcome measures for inflammatory conditions such as RA (e.g., the percentage of patients in remission or low disease activity state), caution should be exercised because the development of outcome measures requires careful consideration of several additional factors during development, as outlined by Suter, et al28, including outcome attribution and appropriate risk adjustment.
Although to our knowledge this is the most comprehensive review of quality measures in inflammatory arthritis, several limitations should be recognized. First, although a systematic process was used to identify selected measurement sets, it is possible that some sets were missed, especially if they were published only on a health organization Website that was not reviewed. Additionally, non-English publications were excluded. Further, it should be noted that 1 of the authors on 4 of the included sets is the senior author of this article, and this could lead to bias in assessment. This potential bias was mitigated by adding an additional independent reviewer to determine inclusion of these sets and to rate the quality of these sets. There is also no standardized classification system for arthritis measurement themes and subthemes, and it is possible for another set of reviewers to have classified the individual measures differently.
Unfortunately, there is also no standardized tool available to assess the quality of measure development; however, a published modification of the AGREE II tool was used5. While this tool addresses many concepts important to quality measure development, some domains may be less applicable to quality measure development than to guideline development. For example, the AGREE II tool includes an entire domain on “applicability” that relates to tools for implementation. By their nature, quality measures often represent “tools” that can be used to monitor quality. In our interpretation of this domain, we rated measure sets that included a clear description of tools for quality measure use (e.g., online tools or electronic medical record, and clear specifications for measurement), which often led to lower rating on this domain. We also recommend that the feasibility of measures should be tested prior to implementation. Prior testing appeared to occur infrequently in the reviewed measurement sets, except for the American College of Rheumatology’s RA quality measure set, which was tested in a new electronic clinical care registry (Rheumatology Informatics System for Effectiveness) prior to publication12, and the Eumusc.net measures, where the initial set was audited with questions involving applicability and feasibility before using that feedback to develop the final set. This concept of testing feasibility prior to implementation was not included in the AGREE II tool and may have led to a biased assessment of a measurement set’s quality.
A final limitation of our review was that because only development of measures was assessed, and not their testing or use, it was often unclear whether the measures included were in routine use for quality improvement and/or pay-for-performance programs. The intended use of the measure was also not included as a domain in the modified AGREE II instrument; however, this should be considered when evaluating measures, because measures used in pay-for-performance may be more likely to represent a “minimal standard” of care, which may influence rigor of development and/or testing, rather than those in use solely for quality improvement purposes.
This systematic review of quality measures for inflammatory arthritis reveals a major focus on process and structure measures for RA. Review of development of these measures shows heterogeneity, which may undermine their generalizability to other healthcare settings. In future measure development efforts, we recommend using clear frameworks for measure development and testing the measures for feasibility prior to implementation. Quality measures are important in ensuring that high-quality care that meets consistent standards is being provided to populations with inflammatory arthritis. Development of effective, well-designed measures for quality monitoring should be an ongoing goal. Understanding what measures are currently available and their strengths, weaknesses, and underlying assumptions will ensure that the best measures are selected for use and help direct future measure development.
ONLINE SUPPLEMENT
Supplementary material accompanies the online version of this article.
Acknowledgment
Lorraine Toews, MLIS, Librarian, University of Calgary, for her help with the literature search.
Footnotes
Full Release Article. For details see Reprints and Permissions at jrheum.org
Azin Rouhi was funded by a Canadian Rheumatology Summer Student Research Award.
- Accepted for publication August 29, 2017.
Free online via JRheum Full Release option
REFERENCES
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.