Abstract
Objective. Five core domains have been endorsed by Outcome Measures in Rheumatology (OMERACT) for acute gout: pain, joint swelling, joint tenderness, patient global assessment, and activity limitation. We evaluated instruments for these domains according to the OMERACT filter: truth, feasibility, and discrimination.
Methods. A systematic search strategy for instruments used to measure the acute gout core domains was formulated. For each method, articles were assessed by 2 reviewers to summarize information according to the specific components of the OMERACT filter.
Results. Seventy-seven articles and abstracts met the inclusion criteria. Pain was most frequently reported (76 studies, 20 instruments). The pain instruments used most often were 100 mm visual analog scale (VAS) and 5-point Likert scale. Both methods have high feasibility, face and content validity, and within- and between-group discrimination. Four-point Likert scales assessing index joint swelling and tenderness have been used in numerous acute gout studies; these instruments are feasible, with high face and content validity, and show within- and between-group discrimination. Five-point Patient Global Assessment of Response to Treatment (PGART) scales are feasible and valid, and show within- and between-group discrimination. Measures of activity limitations were infrequently reported, and insufficient data were available to make definite assessments of the instruments for this domain.
Conclusion. Many different instruments have been used to assess the acute gout core domains. Pain VAS and 5-point Likert scales, 4-point Likert scales of index joint swelling and tenderness and 5-point PGART instruments meet the criteria for the OMERACT filter.
Acute gout is characterized by the sudden onset of intense pain and swelling of 1 or more joints, reaching a maximal level of severity within hours and usually resolving over 10–14 days. The aim of therapy for acute gout is rapid resolution of the attack. Typically, acute gout is treated with nonsteroidal antiinflammatory drugs (NSAID), corticosteroids, or colchicine. There has been renewed interest in the treatment of acute gout since the identification of the central role of the NLRP3 inflammasome and interleukin 1β (IL-1β) in initiation of the inflammatory response to monosodium urate crystals1. This has led to recent clinical trials of IL-1β inhibitors for management of acute gout.
Since 2002, the Outcome Measures in Rheumatology (OMERACT) Gout Special Interest Group has worked toward defining outcome measures for studies in gout2,3,4,5,6,7,8,9,10. Five core domains have been endorsed by OMERACT for studies of acute gout: pain, joint tenderness, joint swelling, patient global assessment, and activity limitation5. Although these domains have been endorsed for acute gout trials, the instruments for each of these domains have not been fully developed nor endorsed by the OMERACT process for this context. The aim of this systematic literature review was to evaluate instruments for the acute gout core domains according to the OMERACT filter: truth, feasibility, and discrimination11.
MATERIALS AND METHODS
A systematic search strategy was formulated to provide a written summary of the evidence for instruments in the acute gout core domains endorsed by OMERACT. The research question was which instruments assessing the core domains in acute gout met the OMERACT filter. The following search keywords were used: “acute gout,” “gout flare,” “gouty arthritis,” “gout pain,” “gout randomized control trial,” “gout attack,” “gout tenderness,” “gout swelling,” “gout patient global,” “gout outcome,” and “gout activity.” Searches were performed in the following electronic databases: PubMed, Medline, Cochrane Central Register of Controlled Trials (The Cochrane Library), Excerpta Medica Database (EMBASE), European League Against Rheumatism (EULAR) meeting abstract archive, and American College of Rheumatology (ACR) Annual Scientific Meeting abstract archive.
Bibliographical references of individual publications were also checked. Data sources were English publications from these databases, and hand searches. No date restrictions were used (earliest database search date was 1946). The search was completed in December 2011. An example of the search strategy is shown in Figure 1A. Articles and abstracts were included if the participants had acute gout, and at least 1 core domain was assessed in the study. The search results were further cross-checked with the results of an independent systemic literature review of randomized controlled trials (RCT) for treatments of acute gout to ensure that all relevant RCT studies were identified12.
A total of 6942 articles were generated by the search, with 4680 excluded whose titles did not relate to acute gout. Case reports, prevalence studies, studies of conditions other than acute gout, or those that did not address any aspect of the OMERACT filter were further excluded based on abstract or full text review. A total of 77 abstracts and full-text articles met inclusion criteria and were included in the analysis (Figure 1).
For each outcome domain, articles were assessed by 2 independent reviewers (CZ and RG) to summarize detailed information about each instrument according to the components of the OMERACT filter: feasibility, truth, and discrimination11. Aspects of feasibility considered were cost, training required, equipment required, and patient acceptability. Aspects of truth considered were face validity (whether the method looks right), construct validity (whether the method relates to other methods of acute gout assessment in predicted ways, using correlation coefficients of patient level data), content validity (whether the methods cover the relative issues adequately, including any patient assessments), and internal consistency (whether Cronbach alpha was reported). Aspects of discrimination that were considered were within-group change sensitivity (in prospective studies, reported as effect size where available), and between-group sensitivity (differences documented between different allocated treatment groups in prospective studies with relevant statistics reported).
RESULTS
Summary of search results
The literature search identified 77 articles and abstracts that met the criteria for inclusion in the review. The search summary is outlined in Figure 1B. No studies explicitly addressed internal consistency using the specified definitions. Reproducibility data were not available for any instrument in the assessment of acute gout.
Pain
Pain was the most frequently reported domain (in 76 of the 77 studies assessed, Figure 1). Twenty different instruments were used in these studies to assess the pain of acute gout. The 3 most frequently used instruments are shown in Table 1. All 3 methods were considered feasible, with high face and content validity. The 100 mm (10 cm) pain visual analog scale (VAS) has been used in 16 studies of acute gout. Sensitivity to change for the pain VAS has been demonstrated with an effect size of 9.3 after 72 h following canakinumab 150 mg treatment13. This instrument has also documented between-group discrimination in 2 separate clinical trials14,15.
Similarly, the 5-point Likert pain scale has been used in 16 studies of acute gout, including a study of untreated acute gout16. Sensitivity to change for the 5-point Likert scale has been demonstrated with effect sizes of 2.17–2.47 following 2 days of NSAID treatment17. Between-group discrimination has been demonstrated in 2 separate clinical trials18,19.
The 4-point Likert pain scale has been reported in 9 studies of acute gout. Sensitivity to change over time has been reported in many studies, although data were not available to allow calculation of effect sizes. Between-group discrimination has not been demonstrated.
Joint swelling
Joint swelling has been reported in 44 studies, using 15 different instruments (Figure 1). The 3 instruments most frequently used are shown in Table 2. All 3 instruments were considered feasible, although some observer training is required. Physician assessment of joint swelling in the index joint using a 4-point Likert scale (range 0–3) has been used in 8 studies of acute gout. This method has high face validity as it captures the degree of swelling in the affected joint, which is particularly relevant to acute gout, which frequently presents as a mono-arthritis17. Sensitivity to change over time has been reported in many studies, although data were not available to allow calculation of effect sizes. Between-group discrimination has been reported in a clinical trial of canakinumab versus triamcinolone using this instrument18. Several RCT comparing 2 NSAID have not shown differences in change in joint swelling using this instrument17,20.
Physical measurement of the circumference of the affected joint using a tape measure has been reported in 7 acute gout studies. Although this method also allows assessment of the affected joint, there is a large variation in measurement depending on the size of the joint when large joints such as the knee and small joints such as those in the toes are included21. Sensitivity to change over time has been demonstrated with an effect size of 0.46 following 3 days of NSAID treatment22. Between-group discrimination has not been reported using this method.
Physician assessment of the swollen joint count (SJC) has been reported in 3 studies of acute gout. This instrument has the ability to measure the extent of disease in polyarticular gout, but does not capture the degree of swelling in an affected joint. This may reduce the sensitivity of the measure in patients with monoarticular gout, and SJC is not appropriate for studies of monoarticular gout. Within-group and between-group discrimination has been reported using this instrument (Table 2).
Joint tenderness
Joint tenderness has been reported in 39 studies, using 11 different instruments (Figure 1). The 3 instruments most frequently used are shown in Table 3. All 3 instruments were considered to be feasible, although some observer training is required. All instruments assessing joint tenderness may cause some patient distress, as joints affected by acute gout may be extremely tender. Physician assessment of joint tenderness in the index joint using a 4-point Likert scale (range 0–3) has been used in 17 studies of acute gout. This method has high face validity because it captures the degree of tenderness in the affected joint. This is particularly relevant to acute gout, which frequently presents as a monoarthritis17. Sensitivity to change over time has been reported in many studies, with effect size calculated as 2.5 following 3 days of high-dose piroxicam23. Between-group discrimination has been reported in a clinical trial of canakinumab versus triamcinolone using this instrument18. Several RCT comparing 2 NSAID have not shown differences in change in joint tenderness using this instrument17,20.
Physician assessment of joint tenderness in the index joint using a 5-point Likert scale (range 0–4) has been used in 5 studies of acute gout. As outlined above for the 4-point Likert scale, this method has high face validity because it captures the degree of tenderness in the affected joint. Sensitivity to change over time has been reported in a study of untreated acute gout, with effect size calculated as 0.9 on Day 716. A clinical study of intravenous indoprofen showed effect sizes of 2.1 after 2 h of treatment and 7.2 after 48 h21. Between-group discrimination has not been demonstrated.
Physician assessment of the tender joint count (TJC) has been reported in 3 studies of acute gout. As with the SJC, this instrument has the ability to measure the extent of disease in polyarticular gout, but does not measure the degree of tenderness in an affected joint. This may reduce the sensitivity of the measure in patients with monoarticular gout, and TJC is not appropriate for studies of monoarticular gout. Within-group and between-group discrimination has been reported using this instrument (Table 3).
Patient global assessment
Patient global assessment has been reported in 25 studies of acute gout, using 19 different methods (Figure 1). Both patient global assessment of response to therapy (PGART) and patient global assessment of disease activity (PGA) have been reported. Of the 19 instruments, 10 were variations of the 5-point PGART instrument, using different descriptors, ranges, and methods of data collection. The 3 instruments used most frequently are shown in Table 4. All 3 methods were considered feasible, with high face and content validity. In contrast to the PGA, the PGART is a measure of change and does not allow measurement of patient assessment at baseline. A 5-point numerical PGART scale has been reported in 3 articles (see Table 4 for details of this scale). Sensitivity to change over time has been reported, although data were not available to allow calculation of effect sizes. Several RCT comparing 2 NSAID have not shown between-group differences in PGART response using this instrument17,20.
A 5-point descriptive PGART scale has been reported in 2 clinical trials (see Table 4 for details of scale). Sensitivity to change over time has been reported, although data were not available to allow calculation of effect sizes. Two separate RCT comparing canakinumab with triamcinolone acetonide have shown between-group discrimination using this PGART instrument13,18.
A 5-point PGA scale has been reported in 3 acute gout studies. Sensitivity to change over time has been reported in these studies, although data were not available to allow calculation of effect sizes. Two randomized controlled trials comparing 2 NSAID have not shown differences in change in PGA using this instrument24,25.
Activity limitation
Activity limitation has been measured infrequently in studies of acute gout, with only 4 studies reporting this domain, using different instruments (Figure 1). Only 2 instruments, the Health Assessment Questionnaire (HAQ) and the Medical Outcome Study Short Form-36 Health Survey (SF-36) physical function (PF) domain have been reported in more than 1 study. Properties for these 2 instruments are shown in Table 5. Both instruments were considered to be feasible with high content and face validity. Both instruments have been endorsed by OMERACT for studies of chronic gout3,7.
The HAQ has been reported in 2 acute gout studies. Sensitivity to change over time has been reported, with effect size in an observational study of acute gout calculated as 1.43 after > 1 month following treatment26. An RCT comparing canakinumab with triamcinolone acetonide has not shown between-group discrimination.
The SF-36 has been reported in 2 studies of acute gout. However, data specifically related to the PF score has been reported in only 1 acute gout study, a clinical trial of canakimumab versus triamcinolone18. Sensitivity to change over time was observed in this study, although data were not available to allow calculation of effect sizes. Differences between SF-36 PF scores were not reported between groups. However, this study did report that mean SF-36 PF scores in patients with acute gout were much lower than those for the general US population.
DISCUSSION
A key finding of this systematic literature review is that many different instruments have been used to assess the acute gout core domains. The wide variation observed in this review supports the need to standardize measurement of key domains in gout.
All the instruments identified within this review were considered feasible; these are low-cost tools that can be easily and rapidly administered without the need for specialist equipment. Any method that assesses joint tenderness may cause patient discomfort, particularly in the context of acute gout, which can cause exquisite joint tenderness. As in other articular diseases, careful training of observers is required to ensure that assessment of joint swelling and tenderness in patients with acute gout is undertaken in a manner that does not cause undue patient distress.
Most of the instruments commonly used to measure acute gout core domains have high face validity. Gout frequently presents as a monoarthritis17. Thus, assessment of swelling and tenderness in an index joint may have higher face validity than enumeration of the number of affected joints. In particular, TJC and SJC are not appropriate instruments for studies of monoarticular gout. Calculation of correlation coefficients to analyze the relationships between various aspects of acute gout was not possible using published data, although 1 study has reported a highly significant relationship between changes in the 5-point Likert pain score and the 5-point descriptive PGART27. Ideally, the relationship between a patient global assessment and all other instruments should be reported. Based on previous qualitative work5, we would expect patient global assessment to correlate highly with pain and activity limitation, moderately with tender joint assessment, and less with swollen joint assessment. A further validity issue was raised when considering assessment of joint swelling by tape measurement of the index joint, noting the wide variation in sizes of joints frequently affected by gout.
Aspects of discrimination within the OMERACT filter include reproducibility and change sensitivity. No published data were available for reproducibility for any of the acute gout instruments assessed in this review. Although test-retest reproducibility may be difficult to measure and unreliable in the context of acute gout where treatment leads to rapid improvement in the clinical features of inflammation, interobserver reproducibility could be assessed for investigator assessment of swollen and tender joints.
With respect to change sensitivity, acute gout is typically self-limiting over 10–14 days. Thus, even in the absence of treatment, measures of acute gout severity improve over time. This was clearly demonstrated in a study of untreated acute gout, which showed significant reduction in measures of pain, tenderness, and swelling over 7 days16. Further, because of the severe nature of pain caused by acute gout, it is now considered unethical to undertake placebo-controlled trials of acute gout. The majority of clinical trials identified in the literature search were equivalence and safety NSAID studies, typically with indomethacin as the active comparator. Thus, assessment of between-group discrimination for the purposes of the OMERACT filter is somewhat limited. However, several studies did allow analysis of between-group discrimination, particularly a placebo-controlled study of colchicine published in 198715, an RCT comparing high-dose and low-dose celecoxib19, several RCT comparing canakinumab with triamcinolone13,18, and a study comparing a Chinese herbal medication with indomethacin28. Although the minimal important difference has not been reported for instruments assessing acute gout, statistical differences could be detected both within and between groups for the following measures: pain VAS, 5-point pain Likert score, 4-point physician assessments of index joint swelling and tenderness, TJC, SJC, and PGART.
With regards to the OMERACT filter cube taxonomy of discrimination29, all studies report statistical differences because the minimal relevant difference or important differences have not been determined for acute gout, so all change indices are located in the first column of the cube. All studies look at group settings so all change indices are located in the front face of the cube. For the studies that report a within-group change, those data are clearly in the second floor of the cube but for between-group differences, some comparisons concerned change scores (top floor of the cube) and others concerned final scores (bottom floor of the cube).
Many different instruments have been used to assess the acute gout core domains. Pain VAS and 5-point Likert scales, 4-point Likert scales of index joint swelling and tenderness, and 5-point PGART instruments meet the criteria for the OMERACT filter. Further research is required to validate measures of activity limitation for studies of acute gout.
Acknowledgment
The authors thank Susan Foggin, Medical and Health Information Services Manager, University of Auckland, for assistance with literature searches.
Footnotes
-
Supported by a University of Auckland summer studentship (CSZ). Dr. Dalbeth has received consultant fees from Ardea Biosciences, Metabolex, Novartis, and Takeda; her institution has received funding from Fonterra; and she is inventor on a patent related to milk products and gout. Dr. D. Khanna has received consultant fees from Ardea, Takeda, Novartis, and Savient, and has served on a speakers bureau for Savient. Dr. P. Khanna serves on the speakers bureau for Takeda. J.A. Singh has received research grants from Takeda and Savient and consultant fees from Savient, Takeda, Ardea, Regeneron, Allergan, URL pharmaceuticals, and Novartis. J.A. Singh is a member of the executive of OMERACT, an organization that develops outcome measures in rheumatology and receives arms-length funding from 36 companies; a member of the American College of Rheumatology’s Guidelines Subcommittee of the Quality of Care Committee; and a member of the Veterans Affairs Rheumatology Field Advisory Committee.