Abstract
Objective. To perform a COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN)-based systematic literature review of measurement properties of the Polymyalgia Rheumatica Activity Score (PMR-AS).
Methods. PubMed, EMBASE, and CINAHL were broadly searched. English full-text articles, with (quantitative) data on ≥ 5 patients with PMR using the PMR-AS were selected. Seven hypotheses for construct validity and 3 for responsiveness, concerning associations with erythrocyte sedimentation rate, physical function, quality of life, clinical disease states, ultrasound, and treatment response, were formulated. We assessed the structural validity, internal consistency, reliability, and measurement error, or the hypotheses on construct validity or responsiveness of the PMR-AS based on COSMIN criteria.
Results. Out of the identified 26 articles that used the PMR-AS, we were able to use 12 articles. Structural validity, internal consistency, construct validity, and responsiveness were assessed in 1, 2, 8, and 3 articles, respectively. Insufficient evidence was found to confirm structural validity and internal consistency. No data were found on reliability or measurement error. Although 60% and 67% of hypotheses tested for construct validity and responsiveness, respectively, were confirmed, there was insufficient evidence to meet criteria for good measurement properties.
Conclusion. While there is some promising evidence for construct validity and responsiveness of the PMR-AS, it is lacking for other properties and, overall, falls short of criteria for good measurement properties. Therefore, further research is needed to assess its role in clinical research and care.
Polymyalgia rheumatica (PMR) is an inflammatory rheumatic disease characterized by pain and stiffness of the hip and shoulder girdle.1 Glucocorticoids (GCs), the mainstay of treatment, are gradually tapered to control disease activity, and at the same time try to minimize the risk of GC-related adverse events.2 However, during GC tapering, disease relapses (or flares) occur in up to 55% of patients, typically necessitating therapy intensification.3,4
Analogous to the Disease Activity Score in 28 joints in rheumatoid arthritis (RA), Leeb and Bird developed a disease activity score for PMR: the PMR Activity Score (PMR-AS).5 The PMR-AS is a composite score based on C-reactive protein (CRP), pain assessed on a visual analog scale (VASp), morning stiffness (MS; in minutes), elevation of the upper limbs (EUL; active shoulder abduction ranging from 0 to 3), and physician disease activity assessed on a VAS (VASph). The PMR-AS is calculated as follows: CRP (mg/dL) + VASp (0–10) + (MS×0.1) + EUL (0–3) + VASph (0–10). This outcome measure is the most frequently used instrument for disease activity in PMR studies, with up to 40% of publications selected by Duarte et al between 2007 and 2014 using the PMR-AS.6 The PMR-AS is being used in several ongoing trials. Moreover, a trial of baricitinib (ClinicalTrials.gov: NCT04027101) and abatacept (NCT03632187) are adjusting their trial treatment based on PMR-AS values. Therefore, considering this extended use, face validity may be inferred. However, so far, measurement properties of the PMR-AS have not been systematically scrutinized and summarized.6
Our objective was to systematically assess the measurement properties of the PMR-AS in available clinical studies in order to ultimately summarize reliability and validity for future clinical trials and clinical care.
METHODS
Guided by the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) taxonomy and methodology for systematic reviews, we chose to assess internal structure (by assessing model type, structural validity, and internal consistency), reliability (by assessing inter- and intrarater reliability and measurement error), and validity (by testing hypotheses for construct validity and responsiveness) of the PMR-AS in published studies with patients with PMR.7 The review protocol and an amendment (which excluded assessment of PMR-AS–based relapse criteria) were registered in PROSPERO (ID 187907). The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines were used for reporting,8 as shown in Supplementary Material 1 (available from the authors upon request).
We first formulated hypotheses to test for construct validity and responsiveness, as shown in detail in Supplementary Material 2 (available from the authors upon request). To increase the relevance of these hypotheses, the team comprised rheumatologists with both PMR and research experience (CO, AAdB, AvdM) and researchers with experience in systematic reviews (AAdB, CHMvdE). To test construct validity, we hypothesized that increased PMR disease activity is generally accompanied by increased erythrocyte sedimentation rate (ESR), decreased physical function, and decreased quality of life (QOL), and therefore, the PMR-AS is (relatively) highly correlated with these. Further, we expected patients with an ultrasound (US) indicative of active PMR, or with clinical relapse or active disease, to have a significantly higher PMR-AS than those without. To test responsiveness, we hypothesized that PMR disease activity responds substantially (and quickly) to GC treatment, and an increase in disease activity over time may be accompanied by an increase in ESR or a decrease in physical function; therefore, change in PMR-AS is moderately correlated with change in ESR and change in physical function.
Search. PubMed/MEDLINE, EMBASE, and CINAHL were searched in May 2020 and again in May 2021, using a search strategy with variants of PMR (population) and PMR-AS (outcome) in title, abstract, or full text (Supplementary Material 3, available from the authors upon request). Further, investigator knowledge and the references of included articles were used to search for additional publications. Records were stored in a Mendeley database and duplicate records were removed.
Study selection. First, record titles and abstracts were screened by 2 independent investigators (TEB, LEAN) using Rayyan.9 Studies were eligible if they were in English, published in a peer-reviewed journal, had a controlled or noninterventional design, and included ≥ 5 patients diagnosed with PMR discernible from giant cell arteritis (GCA). English was chosen as the single language, as authors are not experienced enough in other languages and the PMR-AS is predominantly used in countries using English as lingua franca. PMR diagnosis could be based on either recognized classification criteria or clinical diagnosis.10,11,12,13,14 To increase sensitivity, articles included by ≥ 1 investigator, or with uncertainty regarding eligibility, were included for the next phase.
Second, full-text articles were assessed by 2 investigators (TEB, LEAN) and reasons for exclusion were hierarchically recorded as (1) no full text available, (2) not available in English, (3) not presenting original data (eg, reviews), (4) < 5 discernible PMR patients without GCA or not distinguishing between PMR and GCA, and (5) not using the PMR-AS as an instrument. Disagreement was resolved through consensus. The systematic reviews by Duarte et al and Dejaco et al were used as an additional check to confirm whether any relevant articles were missed.6,15 We did not apply snowballing, but we did contact authors regarding potential data on measurement error and reliability.
Third, the same 2 investigators assessed whether internal structure, reliability, or validity (by testing hypotheses) could be investigated with an article based on COSMIN criteria.7,16 Articles not providing information or data on these properties of interest were excluded. Additionally, multiple articles regarding the same study (data) were discussed, and articles not providing additional data on a measurement property were excluded.
Data extraction.
Study characteristics — Standardized data extraction tables were established a priori through discussion and were based on COSMIN recommendations.7 Study characteristics included design, participants (number, inclusion and exclusion criteria), treatment, and follow-up duration. For studies with multiple study samples (eg, multiple cohorts for a validation study or different treatment groups in a trial) characteristics regarding each subsample were extracted. Study characteristics were extracted by 2 investigators (TEB, LEAN) and disagreement was resolved through discussion and consensus.
Summary of measurement properties — Methodological quality, measurement property results, and whether these properties met criteria for good measurement properties were described in COSMIN-based Summary of Measurement Properties (SOMP) tables. Methodological quality of each study was evaluated for the property it was used to assess specifically. For example, methodologic quality for a study used to assess construct validity was evaluated using different criteria than a study used to assess internal consistency. Methodologic assessment was based on the COSMIN risk of bias checklist, which uses a 4-point rating scale (ranging from inadequate to very good), with the lowest score corresponding to the overall quality regarding that property.7 Results of measurement properties were extracted (eg, correlations) and evaluated according to the COSMIN criteria for good measurement properties. Our hypotheses for construct validity and responsiveness are shown in more detail in Supplementary Material 2 (available from the authors upon request).7 The overall SOMP tables were evaluated using COSMIN recommendations for good measurement properties (eg, 75% of construct validity hypotheses should be met to accept construct validity).
Statement of ethics and consent. The paper has been seen and approved by all authors. They have given necessary attention to ensure the integrity of the work and are all meeting the ICMJE criteria for authorship.
RESULTS
Studies included. Our search resulted in 1052 records, with 853 remaining after removal of duplicates; there were 26 articles with a controlled or noninterventional design and using the PMR-AS as an outcome (Figure 1).5,17–41 Of note, an article by Björkman et al did not use the entire PMR-AS (VASph was left out) and was therefore excluded.41 Further, an article by Leeb et al on PMR response criteria was included, since data regarding this composite score could be used to assess structural validity and internal consistency of the PMR-AS.17 One article not included through the search strategy, as it did not mention the PMR-AS in the title, abstract, or full text, but mentioned “Polymyalgia Rheumatica Activity Score” instead, was added based on investigator knowledge.42
PRISMA 2009 adapted flow chart. PMR-AS: Polymyalgia Rheumatica Activity Score; PRISMA: Preferred Reporting Items for Systematic reviews and Meta-Analyses.
From the 26 articles using the PMR-AS, we could use 12 articles to assess measurement properties of interest.5,17–22,24,25,29,35,42 Eight articles could not be used to assess measurement properties at all. Six articles did not provide additional data regarding measurement properties and were excluded. From the 2 articles by McCarthy and colleagues, using the same study population, the publication from 2014 was chosen because it also included additional results of patient-reported outcome measures.25,26 From the 2 articles by Kreiner and colleagues, both using the same study data, the publication from 2010 was chosen due to display of treatment group results.22,23 From 5 articles on the TENOR (Tolerance and Efficacy of tocilizumab iN pOlymyalgia Rheumatica) study,27,29,33,38,39 we used the article by Devauchelle-Pensec29 due to the longitudinal display of PMR-AS results.
Study characteristics. The 12 articles (with 12 study populations) used to assess measurement properties were mostly prospective (83%) and noninterventional (66%) in design (Table 1).5,17–22,24,25,29,35,42 Inclusion criteria varied from clinical (rheumatologist) diagnosis to formal classification criteria based on the European Alliance of Associations for Rheumatology/American College of Rheumatology,14 Bird,10 Jones and Hazleman,11 Chuang,12 and Healey13 criteria. Exclusion criteria varied as well, from solely exclusion of GCA to exclusion of a range of disease with symptoms that may mimic PMR symptoms.
Study characteristics.
Summary of measurement properties.
Structural validity and internal consistency — Although some evidence was found on structural validity and internal consistency, it was insufficient based on COSMIN criteria. We found no (theoretical) background information on internal structure of the PMR-AS and, therefore, we could not formally conclude whether the PMR-AS was developed as a formative or reflective instrument.43,44 Consequently, information on structural validity and internal consistency was assessed and summarized (Table 2).
View this table:Table 2.Summary of measurement properties: internal consistency and structural validity.
One article by Leeb et al assessed structural validity and mentioned that the PMR-AS is a monodimensional instrument.18 Unfortunately, no information was provided on how dimensionality was assessed (eg, which kind of factor analysis was used) and what the exact results were (eg, what the resulting comparitive fit index was). Based on COSMIN criteria, we therefore considered methodology inadequate and exact (numerical) results unknown.7,43 Two articles, with a total of 3 study populations, assessed internal consistency, with standardized Cronbach α of 0.91, 0.88, and 0.81 (Table 2).5,18 However, considering that internal consistency requires consideration and acceptable unidimensionality of each separate subscale based on COSMIN criteria, methodological quality may be considered doubtful, and this property may not be accepted yet.
Reliability and measurement error — No information was available on inter- and intrarater reliability and measurement error of the PMR-AS as a continuous measurement instrument for disease activity.
Construct validity — Five out of 7 hypotheses concerning construct validity could be tested, from which 3 (60%) were confirmed with relatively limited sample sizes and adequate overall methodological quality (Table 3). These confirmed hypotheses included the association between the PMR-AS and physical function, QOL, and clinical disease state.5,18,20,22,24,25 The 2 hypotheses that were not confirmed explored the association between the PMR-AS and ESR, and US findings indicative of PMR.5,18,21 As a note, we used patient data provided by Catanoso et al to calculate correlations, although these were nonsignificant with a sample size of only 6.19 Thus, the construct validity could not be confirmed as the required 75% threshold for COSMIN construct validity was not met.7
View this table:Table 3.Summary of measurement properties: construct validity.
Responsiveness — All 3 hypotheses concerning responsiveness could be tested, and 2 (67%) were confirmed, although sample sizes were inadequate and methodological quality was doubtful at times (Table 4). The 2 hypotheses that were confirmed explored the change in PMR-AS following GC treatment (increase) and the association between change in PMR-AS and physical function.5,42 In addition, there were 2 open-label trials on tocilizumab,29,35 which could be used to explore overall treatment responsiveness, and which further supported overall responsiveness of the PMR-AS to treatment. The hypothesis that was not confirmed explored the association with ESR, although assessment of this hypothesis (and the hypothesis on association with change in Health Assessment Questionnaire–Disability Index [HAQ-DI]) was possible by calculating correlations with the data of 6 patients provided by Catanoso et al.19 As an additional note, patients from the study by Do-Nguyen et al were retrospectively included (between 1999 and 2003) prior to publication of the PMR-AS.42 However, no information was given about this period and how potentially missing measurements were handled; therefore, methodological quality was rated inadequate. In summary, evidence to support responsiveness is limited, as it is based on a relatively small number of studies with small sample sizes, including some of doubtful methodological quality.
View this table:Table 4.Summary of measurement properties: responsiveness.
DISCUSSION
We performed a COSMIN-based systematic review of quantitative studies to assess and summarize measurement properties of the PMR-AS. Our limited restrictions and use of articles without measurement property assessment as a goal resulted in a range of designs and study populations that could be used to assess measurement properties. There is some promising evidence, although insufficient, for construct validity and responsiveness, but no data were found on measurement error and reliability, and further assessment of internal structure may be necessary.
Although it is not clear from background information which model is applicable, a formative multidimensional model may be appropriate for the PMR-AS as an instrument for measuring disease activity.5,18. If disease activity is measured using multiple different constructs (eg, pain and inflammation are different and nonexchangeable constructs), then a formative multidimensional model is applicable.43 A formative model may be supported further by the statement by Leeb et al, noting that acute-phase reactants and MS were items for disease activity independent of pain, and consequently do not seem to be exchangeable constructs.17 If this model is indeed applicable, then structural validity and internal consistency are not applicable. However, further assessment would be necessary to draw this conclusion.
There may be several reasons why some construct validity and responsiveness hypotheses were not met, whereas others were. Hypotheses regarding physical function, QOL, and clinical disease state were confirmed, but those regarding ESR and US were not, reflecting that items of the PMR-AS may gravitate more toward symptoms, functioning, and health perception, as opposed to biological and physiological variables when looking at the Wilson and Cleary model.45 As a matter of fact, the association between PMR-AS and physical function, as well as between PMR-AS and QOL, seems even higher than for some disease activity measures for other rheumatic disorders.46,47 Concerning US specifically, the criteria of Macchioni et al for identifying persistent inflammation on US may not properly distinguish PMR disease activity states due to limited sensitivity and specificity.21,48 Indeed, previous studies show a lack of correlation between US and separate PMR-AS components, except perhaps MS,49,50 although comparison may be hindered due to use of different US procedures. Further assessment of relation with other—particularly physiological—instruments combined with larger study sizes for some hypotheses (eg, physical function) may be needed to meet criteria for good measurement properties.7
Contrary to the promising validity, however, the current lack of information regarding the reliability and measurement error of the PMR-AS has some negative implications for current use. For example, it may be that a previously proposed relapse criterion (eg, an increase in PMR-AS of 4.2) falls outside the smallest detectable change; thus, a relapse may not be detected or may be incorrectly diagnosed.7,51 Further, information on reliability is needed among others to determine sample size for trials based on the PMR-AS. Some reliability studies have been performed regarding specific items of the PMR-AS, but not for all components and not for the PMR-AS as a whole.52 Further, Binard et al performed a reliability study of PMR-AS relapse criteria, but unfortunately did not assess the scale as a whole.53
When generalizing these results on measurement properties of the PMR-AS to clinical practice, an important note should be made. In the development article by Leeb et al, it is noted that the original objective of the PMR response criteria (and presumably the PMR-AS thereafter), was to establish criteria that could be used in future clinical trials.17 Consequently, a range of conditions which may mimic PMR (eg, GCA, RA, osteoarthritis [OA] and local bursitis/tendinitis) were excluded in the developmental studies by Leeb et al.5,17 However, these comorbidities may be quite prevalent in the elderly PMR population, as shown by Do-Nguyen et al in a retrospective cohort of 137 patients with PMR, out of whom 45 had OA; these comorbidities may interfere with PMR disease activity assessment.42 Therefore, performance of the PMR-AS in either clinical practice or trials of a pragmatic nature may differ due to concomitant disorders influencing PMR-AS items.
Some strengths and limitations should be noted regarding this review process. Although a broad search strategy without design or property filter was used, 1 article—which used more general terms for disease activity as opposed to PMR-AS—was not found in the search.42 Further, not all data and studies that used the PMR-AS were usable to assess measurement properties. A main reason studies could not be used was that outcomes key to this review were not reported; for example, McCarthy et al measured change in HAQ and PMR-AS but reported no correlation between these.25 Another reason studies could not be used was that some correlations that were reported were not anticipated when hypotheses were drafted; for example, no hypotheses were formulated regarding fluorodeoxyglucose positron emission tomography scans, since their exact role in PMR is still unclear.48 However, a broad range of hypotheses were formulated and amended during initial extraction of study characteristics to optimize the number of hypotheses that could be tested. Finally, since the nature and cut-offs of hypotheses formulated by the research team are inherently subjective, they might be different, or less stringent, for another team. However, to increase relevance and feasibility of hypotheses, our research team comprised both rheumatologists with PMR research experience and researchers with experience in systematic literature reviews.
All in all, the PMR-AS shows promise as a measurement instrument for PMR disease activity, although evidence on certain measurement properties is still limited or absent. Further, new measurement property validation methodologies have been introduced since the development and initial validation of the PMR-AS. Therefore, stepwise reassessment of reliability, validity, and other properties not included in this review, such as interpretability, may be useful to assess the potential role of PMR-AS as an outcome for trials, as well as its applicability for clinical treat-to-target strategies. However, considering our findings, the PMR-AS seems the most appropriate measure for PMR disease activity in clinical trials as of yet.
Footnotes
This is an investigator-initiated and -funded study.
The authors declare no conflicts of interest relevant to this article.
- Accepted for publication February 16, 2022.
- Copyright © 2022 by the Journal of Rheumatology