Abstract
Objective. To identify the instruments used to assess polymyalgia rheumatica (PMR) in published studies.
Methods. A systematic literature review of clinical trials and longitudinal observational studies related to PMR, published from 1970 to 2014, was carried out. All outcome and assessment instruments were extracted and categorized according to core areas and domains, as defined by the OMERACT (Outcome Measures in Rheumatology) Filter 2.0.
Results. Thirty-five articles (3221 patients) were included: 12 randomized controlled trials (RCT); 3 nonrandomized trials; and 20 observational studies. More than 20 domains were identified, measured by 29 different instruments. The most frequently used measures were pain, morning stiffness, patient global assessment and physician global assessment, erythrocyte sedimentation rate, and C-reactive protein. The definition of outcomes varied considerably between studies.
Conclusion. The outcome measures and instruments used in PMR are numerous and diversely defined. The establishment of a core set of validated and standardized outcome measurements is needed.
Polymyalgia rheumatica (PMR) is an inflammatory disease with a lifetime risk estimated at 2.4% for women and 1.7% for men1 and a peak incidence after 60 years of age. The diagnosis of PMR relies on clinical and laboratory manifestations, supported by a rapid, favorable response to glucocorticoid (GC) therapy at medium doses (15–20 mg/day of prednisone or equivalent). When untreated, PMR can cause profound disability. GC remains the gold standard therapy for PMR and is usually efficacious. However, the potential toxicity of longterm GC therapy2 imposes the need to search for safer alternatives.
Future research in PMR requires the use of valid and reliable outcome measures that encompass the relevant scope of disease manifestations. A variety of outcomes have been used to assess disease activity, including clinical features (pain and morning stiffness), ultrasonography (US) variables, and laboratory measures such as erythrocyte sedimentation rate (ESR), C-reactive protein (CRP) and interleukin-6 (IL-6) levels. Composite scores of disease activity3, and definitions of good response, remission and relapse have been proposed3,4,5,6. However, these measures have not yet been extensively validated in PMR and do not incorporate patient viewpoints.
The Outcome Measures in Rheumatology (OMERACT) initiative aims to develop core sets of outcome measures capable of providing consistent estimates of the benefits of interventions for each given condition in clinical trials7. According to the OMERACT Filter 2.0, such core sets should include at least 1 domain from each core area. Four core areas, broad aspects of a health condition, are defined: 3 encompass the “impact of health conditions” — life impact, resource use, and death; and a fourth core area encompasses pathophysiological manifestations8,9. This filter also considers domains, as subspecifications within 1 area9,10. In order to be included in a core set, a domain should be measurable by truthful, discriminative, and feasible instruments9,11.
The OMERACT PMR working group was formed to define a core set of outcome measures to be used in future clinical research in PMR. With the present systematic literature review we aimed to supply this endeavor with objective information on outcome measures currently used to assess PMR disease activity and response to treatment.
MATERIALS AND METHODS
Search strategy
The literature search was performed in MEDLINE, CINAHL, Science Citation Index from the Web of Science, Cochrane Library (Cochrane Central Register of Controlled Trials and Cochrane Database of Systematic Reviews). The research strategy was based on the following key words: [“Polymyalgia Rheumatica” (Medical Subject Headings)], and covered material published from January 1, 1970 to June 30, 2014.
Inclusion criteria
Studies were included if they: (1) used published classification criteria to select patients; (2) were written in English, French, Portuguese, or Spanish languages; (3) followed a design of either clinical trial or longitudinal observational study, and (4) were available in full text.
Studies that included heterogeneous patient samples and published data that did not allow differentiating subjects with PMR from those with other diseases (e.g., giant cell arteritis or late onset rheumatoid arthritis) were excluded.
Study selection
Titles, abstracts, and full reports of articles identified were systematically and independently screened by 2 authors (CD and RF) with regards to inclusion and exclusion criteria. In the first step, selection was based on titles and abstracts. Full reports of articles selected in this phase were evaluated (second step) to select articles to include in this systematic review. Disagreements regarding selection of an article were discussed between both reviewers until consensus was reached. Persistent disagreements were resolved by a third evaluator (JAPS).
Data extraction
During data extraction, special attention was given to Patients and Methods and Results sections of each article. All data were extracted using a standardized template designed for this review, which had been piloted and improved, and which included study design, sample size, followup period, outcome measures used, and the method of assessment.
Each outcome was characterized according to the OMERACT Filter 2.0 considering core areas (pathophysiological manifestations, life impact, death, resource use) and domains8.
RESULTS
Results of the literature search and selection of articles are presented in Figure 1. The electronic search strategy yielded 868 articles, 43 of which were selected, on the basis of title and abstract, for further assessment/detailed review. Ultimately, 35 studies12 and 21,22 and 31,32 and 41,42,43,44,45,46 met inclusion criteria for this systematic review (Figure 1). Agreement between the 2 reviewers was 96.6% and 100% for the first and second steps of article selection, respectively.
Included studies
Table 1 shows the study design characteristics of included articles. Twelve of the included studies are randomized drug trials, controlled against either placebo or conventional PMR treatment12 and 21,22,23. Three are nonrandomized interventional studies or ones without clear information about randomization24,25,26. Longitudinal observational studies represent more than one-half of selected articles (20 of 35)27 and 36,37 and 46. One of these observational studies36 is a longterm followup of an already included RCT20. The study size ranged from 424 to 781 subjects32,41, with followup periods varying between 14 days22 and 34 years32. All studies included a majority of females and patients older than 50 years, which is in agreement with classical PMR features47,48,49,50.
Studies identified in this literature review include outcomes and instruments pertinent to all core areas defined by OMERACT, except resource use. The core area most represented is pathophysiological manifestations, which included a total of 6 domains, followed by life impact with 5 domains (Table 2).
Pain
Pain was used as an outcome in 17 studies12,13,14,15,16,18,19,22,25,34,36,39,40,43,44,45,46. A visual analog scale (VAS) was commonly used to quantify pain, usually as a 0–10 cm scale (11 studies). In 3 of them12,14,16, pain was graded using an ordinal scale form “0” to “3.”
Most published reports do not provide a clear definition of the pain being assessed. The description of pain localization varies: “shoulder and pelvic girdle pain,”15 “proximal pain,”34 “proximal muscle pain,”14,16 or “joint or muscle pain.”13 Matteson and colleagues evaluated pain considering different locations including shoulder, limbs, and global44. None of the published reports specified the period of time under evaluation when asking patients about their “pain.”
Morning stiffness
Stiffness, more commonly morning stiffness, was considered in almost all the included studies13,14,15,16,18,19,20,21,22,24,25,26,27,28,30,33,34,35,36,37,38,40,43,44,45,46. It was evaluated as an independent outcome in 11 studies13,14,15,16,18,19,25,27,34,40,44, and was included as a variable in composite disease activity scores or in the definition of relapse/recurrence/remission in an additional 15 studies19,20,22,24,26,28,33,35,36,37,38,43,44,45,46.
Morning stiffness was measured in terms of duration (“minutes”) in the majority of the studies. In 1 RCT18, morning stiffness duration was reported in 1 study through a 4-point scale (1: < 30 min; 2: 30–60 min; 3 = 60–120 min; and 4: > 120 min). In 2 studies, stiffness was graded from 0 to 3, where “0” means no symptoms; but it is unclear whether severity, duration, or both was being assessed14,16. No information is given to the meaning of the other values in the scale. Only Weyand and colleagues27 evaluated the severity of morning stiffness using a 0–10 cm VAS. Only 1 RCT13 and 1 observational study27 gave precise information about the time interval under evaluation (“average of last week”).
Patient and physician global assessment
Patient Global Assessment (PGA) of disease activity was measured in 9 studies13,19,22,25,27,33,35,38,46, always as a 0–10 cm VAS except in 2 studies13,17, where a 5-point ordinal scale was used.
Physician Global Assessment (PhGA) was used in 14 studies12,13,19,22,25,27,33,35,36,38,43,44,45,46, 12 of them as a 0–10 cm VAS and 213,27 as 5-point ordinal scales.
In 9 studies22,25,35,36,38,43,44,45,46 both PGA and PhGA were included as a variable within a predefined composite disease activity score.
Two instruments were employed by a single study33: (1) PGA of General Health and (2) Patient Satisfaction with Disease Status (classification of disease state according to the Austrian school mark system: 1 = excellent, 2 = good, 3 = average, 4 = moderate, 5 = unsatisfactory).
Function and quality of life
Function was assessed in 5 observational studies34,36,38,44,46, 1 open label trial25, and 3 RCT21,22,23. In all studies, function was assessed through the generic Health Assessment Questionnaire (HAQ)51.
Health-related QoL was considered in 2 large observational studies34,44 and was assessed through the generic tool Medical Outcome Study Short Form 36 Survey52. In a single observational study46, QoL during the past month was assessed using a 0–100 mm VAS, where 0 means normal and 100 the worst QoL.
Other clinical outcomes
Elevation of upper limbs was considered an outcome in some studies22,25,33,35,46, always as a component of a composite disease activity score. Upper limb elevation was measured on a 0–3 scale with the following levels: 3 = no upper limb elevation; 2 = elevation below shoulder level (< 90°); 1 = elevation at shoulder level (90°); and 0 = elevation above shoulder level (> 90°). Muscle function23 (hand grip strength and jump test), chair stand test23, 10-meter walking23, and time to onset of fatigue (hours)13 were used as outcomes in a single study each. Intensity of fatigue reported by the patient, using 0–100 mm VAS, was assessed in a single study44.
ESR and CRP
ESR12,13,14,15,16,17,18,19,21,22,24,25,26,27,28,30, 31,33,34,35,36,37,38,39,40,43,44,45,46 and CRP12,13,15,17,19, 21,23,24,25,26,28, 30,31,33,34,35,36,37,38,40,43,44,45,46 were used in the assessment of disease activity by most but not all RCT and observational studies.
Other laboratory measures
Other laboratory outcome measures used in some observational and clinical trials include serum fibrinogen12,15,16,45,46 and IL-6 levels19,22,24,27,30,31,37, mainly as experimental evaluations.
Ultrasonography
US was used in 3 prospective observational studies38,40,44 and in 1 open label trial25. Different studies used different evaluation protocols, there being no formal proposal for the standardization of US evaluation of response to therapy in PMR. Jiménez-Palop and colleagues considered the evaluation of intraarticular synovitis at the shoulder and hip, tenosynovitis, and bursitis in the shoulder. This study demonstrated good inter and intraobserver reliability (0.96 and 0.99, respectively) but no statistically significant correlation was found with clinical and laboratory variables of disease activity40.
Composite measures
Most of the studies integrated the individual outcome measures into composite indices, considered as response/relapse criteria or activity scores. This is summarized in Table 3. Most of them defined relapse or recurrence as the observation of new symptoms, increase of ESR (usually > 30 mm), or increase of CRP (> 0.5 mg/dl or 1 mg/dl), after remission has been achieved, in patients receiving GC or after discontinuation of GC, respectively. Proposed response criteria include improvement of symptoms and reduction/normalization of inflammatory variables (ESR and CRP). In 2003, the European Collaborating Polymyalgia Rheumatica Group proposed a core set of response criteria. These EULAR response criteria comprise an improvement in VAS pain (obligatory) and at least 3 of the following 4 items: CRP (mg/l) or ESR (mm/h), morning stiffness (min), elevation of upper limbs (0–3), and VAS PhGA4. However, there is considerable discrepancy in the definition of “improvement” and in the duration of improvement required to define “response.”
One of the most common composite disease activity scores used was the Polymyalgia Rheumatica Activity Score (PMR-AS), developed by Leeb and Bird6 and defined as PMR-AS = CRP (mg/dl) + elevation of upper limbs (0–3) + 0.1 × morning stiffness(min) + VAS patient pain (0–10) + VAS physician global (0–10)1.
The PMR-AS score showed a good correlation with other outcome measures, namely with VAS PGA (r = 0.76) and ESR (r = 0.32)6,33. Given that CRP is a component of PMR-AS, it is not surprising that the composite score correlated with ESR, which is closely associated with CRP. Similarly, another component of PMR-AS is the patient pain VAS; and patient global VAS is usually strongly correlated with pain VAS. PMR-AS also showed very good internal consistency in 2 different cohorts (Cronbach-α 0.90 and 0.88)6 and demonstrated reliability3,33,53.
GC therapy
The characterization of the GC treatment regime employed is extremely variable. Only a few studies included the cumulative GC dose21,27,36, the minimum dose required13,17,21, the duration of therapy21, or the percentage of discontinuation of steroids after a specified duration of followup13,20,21,36.
Adverse events
The incidence and characterization of adverse events related to interventions were described in the majority of the clinical trials15–24,25, and in the longterm followup study of patients treated with methotrexate36. None of the studies performed a systematic and structured evaluation of safety.
Some observational studies were designed to assess specific adverse events related to GC, such as vertebral fractures39,42, bone mineral content16,42, cardiovascular and cerebrovascular events32,42. One study described mortality and its causes29. The methods used to elicit adverse effects in observational studies was variable, but death registries and patient files were the most common sources of information.
DISCUSSION
This systematic literature review highlights a remarkable variability in the assessment of PMR in research settings. Patient reported outcomes (PRO) are the most commonly studied outcomes and were assessed in almost all studies included in this review. Fatigue, however, was evaluated in 2 studies only. Function and QoL were evaluated in less than 10% of the studies, in spite of their importance to patients54.
The instruments used to measure PRO in the selected articles were very heterogeneous. Also, there was, in general, a poor definition of what is actually being measured (e.g., concerning morning stiffness: Is the question referring to the girdles, the hands, or elsewhere? At what time of the day? What is the time period being assessed? Are we measuring duration, severity, or both?).
There are no studies addressing the relative importance of each outcome from the patient’s perspective. During OMERACT 11 (North Carolina, USA, May 2012), the PMR-SIG Group presented data from a preliminary “scoping” consultation exercise involving 104 patients with PMR from 3 centers in the UK and 1 in Belgium. In their study, patients were invited to express their concerns regarding disease and treatment. Symptoms and “impairment” were clearly important to patients, with pain, stiffness, fatigue, and sleep disturbance being mentioned very often. Physical activity and treatment aspects like GC-related adverse effects were also considered important54. It is important that patients’ concerns and wishes are incorporated into any core outcome set.
Outcomes assessed by physicians rather than patients were less heterogeneous. Physician-reported outcomes were used less frequently in comparison to PRO, with PhGA (0–10 cm VAS) being the most commonly used. Given the discrepancies between patient and physician evaluations that have been found in several diseases55,56,57, it is generally considered that both PRO and physician reported outcomes should be included to capture the burden of disease.
All selected articles reported either ESR or CRP, except in studies designed to evaluate specific adverse events29,32,41,42. Other laboratory variables, such as IL-6 and fibrinogen, or US have been considered so far as “experimental” outcomes.
Disease activity scores or definitions of remission, incorporating both physician and patient-reported outcomes are well-established in other rheumatic diseases and may prove useful also in PMR. The concepts of remission/relapse/recurrence are not consistently defined for PMR. A composite score of disease activity, the PMR-AS, was developed by Leeb and colleagues in 200733 and has been used in about 40% of selected articles published after 2007.
We recognize some strengths and limitations to our study. We used the most important databases of medical research articles, considered other languages beside English, and scrutinized a long period of time. The lack of evaluation of the quality of papers may be seen as a limitation, but we believe this was the most adequate strategy to serve the primary goal of identifying all possible outcomes under current use. As a limitation, we did not search conference abstracts or contact the authors in order to enlarge our scope. By including only longitudinal observational studies and clinical trials with a PMR population, we may have lost some outcomes used in cross-sectional studies or in larger studies of rheumatic diseases. We did not perform any psychometric analysis of each instrument, as this was outside the intended scope of this work.
In conclusion, our study revealed that a great heterogeneity exists in the assessment of PMR. Most instruments have been insufficiently validated according to the OMERACT Filter, and the patients’ perspective may not always have been fully covered. These data suggest that further work is needed to define and validate relevant outcome measures for assessment of PMR in order to promote clinical research in this field and enhance comparability of studies. Core areas and domains need to be defined according to OMERACT procedures. Evaluation instruments capable of satisfying the properties required by the OMERACT Filter 2.0 need to be developed, including validity, reliability, feasibility, and responsiveness.