Abstract
Objective. Increasing research interest and emerging new therapies for treatment of fibromyalgia (FM) have led to a need to develop a consensus on a core set of outcome measures that should be assessed and reported in all clinical trials, to facilitate interpretation of the data and understanding of the disease. This aligns with the key objective of the Outcome Measures in Rheumatology (OMERACT) initiative to improve outcome measurement through a data driven, interactive consensus process.
Methods. Through patient focus groups and Delphi processes, working groups at previous OMERACT meetings identified potential domains to be included in the core data set. A systematic review has shown that instruments measuring these domains are available and are at least moderately sensitive to change. Most instruments have been validated in multiple languages. This pooled analysis study aims to develop the core data set by analyzing data from 10 randomized controlled trials (RCT) in FM.
Results. Results from this study provide support for the inclusion of the following in the core data set: pain, tenderness, fatigue, sleep, patient global assessment, and multidimensional function/health related quality of life. Construct validity was demonstrated with outcome instruments showing convergent and divergent validity. Content and criterion validity were confirmed by multivariate analysis showing R square values between 0.4 and 0.6. Low R square value is associated with studies in which one or more domains were not assessed.
Conclusion. The core data set was supported by high consensus among attendees at OMERACT 9. Establishing an international standard for RCT in FM should facilitate future metaanalyses and indirect comparisons.
Fibromyalgia (FM) is a common condition afflicting 2% of the population1. It is characterized by chronic widespread pain with increased sensitivity to pressure elicited pain. The American College of Rheumatology (ACR) classification criteria in 1990 stipulated the presence of chronic widespread pain for at least 3 months and the presence of at least 11 out of 18 tender points2. Direct and indirect medical costs associated with FM are high3 although using diagnosis positively can reduce healthcare utilization4. Aside from pain, FM is associated with many symptoms including fatigue, depression, anxiety, and poor sleep quality. Many clinical trials have been conducted in FM; however variances in outcome measurement methodology have made statistical comparison and pooling of results difficult.
The Outcome Measures in Rheumatology (OMERACT) initiative5 has helped to resolve the problem of outcomes measurement variability in rheumatic diseases such as rheumatoid and psoriatic arthritis, by establishing core data sets that should be collected and reported in randomized controlled trials (RCT). OMERACT offers guidance in selecting core data set domains. Applying the OMERACT filters (i.e., truth, discrimination and feasibility), an iterative process can unfold that continually refines the field’s ability to access relevant aspects of disease/syndrome measurement (domains) with precision.
Previous works based on patient focus group and Delphi exercises have established a list of potential core data set domains for trials in FM. Remarkable consensus regarding the relevant domains for FM is supported by a Delphi exercise among clinician/researchers, patient focus groups6, a Delphi exercise conducted in patients with FM7, and through voting at OMERACT 7 and OMERACT 88. Each of these studies provided empirical support for the selection of outcome domains that should be considered for inclusion in the core data set9. From these works, the relevant domains for FM appear to be (1) pain, (2) patient global, (3) fatigue, (4) health-related quality of life, (5) multidimensional function, (6) sleep, (7) depression, (8) physical function, (9) tenderness, (10) dyscognition (cognitive dysfunction), and (11) anxiety. The Delphi processes and patient focus groups helped to support the face validity (e.g., truth) of these potential domains. The feasibility and discriminatory power of specific instruments used to assess these domains were the topic of a separate systematic review of RCT in FM10. This latter review found that available instruments assessing these domains were at least moderately responsive to change with effect size of at least 0.4 and were feasible for use in trials of FM (with the exception of dyscognition). Most outcome measures in RCT of FM, however, have been adopted from other diseases (e.g., Beck Depression Questionnaire11 used in evaluation of depression). Support for the valid use of these “adopted” questionnaires in some cases requires additional support. The objective of the current study was to examine some of the psychometric properties of existing outcomes measures being used in trials of FM. This information will help to evaluate the valid use of these “adopted” measures in the context of FM and will further help to establish a core set of domains for investigation in FM RCT.
METHODS
Data and analysis
The co-chairmen (PM and EC) on behalf of the steering committee approached 4 pharmaceutical companies, Forest Laboratories, Jazz Pharmaceuticals, Eli Lilly, and Pfizer, for de-identified access to each of their large RCT in FM for the purpose of evaluating the measurement characteristics of the instruments used by each for domain assessment. Data from 10 RCT of 4 compounds being investigated for the treatment of FM were included: milnacipran, duloxetine, pregabalin, and sodium oxybate. Milnacipran and duloxetine are serotonin and norepinephrine reuptake inhibitors while pregabalin is an α2-δ calcium channel agonist. Duloxetine, milnacipran, and pregabalin are licensed in the USA for management of FM. Duloxetine is also approved for treatment of depression and the pain of diabetic peripheral neuropathy. Pregabalin is also approved for the treatment of the pain of peripheral neuropathy and as an adjunct for the treatment of seizure disorder. Sodium oxybate is the sodium salt of gamma hydroxybutyrate. It is a central nervous system depressant and a sleep modifier. It is licensed for treatment of cataplexy and excessive daytime sleepiness in narcolepsy. Given that FM is a polysymptomatic condition, we included trials of different medications reasoning that medications acting on different pathways may have dissimilar impact on individual domains.
Data from RCT of the same medication have been pooled for analysis. For commercial sensitivity, medications are coded as A, B, C, and D. Change values were calculated for each outcome measure at baseline and after treatment at the primary endpoint of each trial.
Mapping outcomes measures to domains
All outcomes measures used in the clinical trials were mapped onto one or more of the following domains: pain, patient global, fatigue, health related quality of life (HRQOL), multidimensional function, sleep, depression, physical function, tenderness, dyscognition, and anxiety. For outcome measures that have multiple domains such as Medical Outcomes Survey Short Form 36 (SF-36)12, the individual domains and summary component scores were mapped and included in the analyses separately.
Support for construct validity
Construct validity refers to the cumulative evidence supporting whether a given scale or instrument actually assesses the topic it purports to measure. Given almost all the instruments used in RCT of FM were developed and validated in other medical conditions, it can not be assumed that these “adopted” instruments actually measure FM signs and symptoms with the same measurement characteristics as those for the conditions for which they were originally designed. For example, a scale claiming to measure fatigue developed and validated in the context of sports medicine may not be measuring the same type of fatigue affecting individuals with FM. Thus despite the common name “fatigue,” evidence would be needed to support a claim that the same fatigue construct was being assessed by this instrument in both populations.
Support for construct validity for measurement in FM, i.e., whether the instrument is really measuring what it is supposed to measure, has not been established, with the exception of SF-36 (personal communication). An example is the Medical Outcome Study Sleep Questionnaire13, which is a validated questionnaire developed to assess sleep in patients with sleep disorders. It has been used in a number of RCT in FM, but its validity and performance in FM have not been examined, creating a situation that requires that appropriateness of continued use of this instrument for the sleep domain in FM studies be evaluated.
The construct validity of the instruments is assessed by examining the convergent and divergent relationships of similar and dissimilar instruments. Instruments measuring similar constructs would be expected to have the strongest relationships (either positive or negative depending on the direction of the scale) and unrelated constructs would be expected to demonstrate weaker relationships. For this study, correlation matrices containing all the outcome measures used with a given compound were constructed. Thus 4 matrices were constructed in total. Either Pearson or Spearman correlation coefficients were used depending on the statistical distribution of the instrument. The mean correlation coefficient of outcome measures mapping to the same domain (intradomain correlation coefficient), was used as an indicator of convergent validity. The mean correlation coefficient of outcome measures of different domains (interdomain correlation coefficient), was used as an indicator of divergent validity.
Support for content validity of domains for the core dataset
Content validity refers to the extent to which a single or group of measures is able to capture the relevant facets of a given condition. For this study the content coverage of the consensually derived domains was examined by multivariate analysis. Patient global impression of change (PGIC) was used as a surrogate of overall improvement and the dependent variable in multivariate regression analyses. The overall R square values from multivariate regression analyses were used to identify the adequacy of the domains and associated measurement instruments to evaluate overall improvement in these RCT of FM. For each regression equation, the instrument with the highest univariate correlation with PGIC from each of the domains was included as independent variable.
RESULTS
The domains and the outcome measures used to index the domains from the 10 RCT are listed in Table 1. Instruments such as the SF-36 or Fibromyalgia Impact Questionnaire (FIQ)14, which were mapped to HRQOL and multidimensional function domains, were almost identical. In trials of one medication, EuroQol was also used. Given the large overlap, HRQOL and multidimensional function were merged into one domain: multidimensional function. Not all the domains were measured in all RCT. The number of domains and instruments used in the trials of the 4 different medications are given in Table 2. While some domains were assessed in all trials (e.g., pain, fatigue), other domains were less consistently assessed (e.g., stiffness, tenderness), and some domains appeared in the evaluation of only one compound (e.g., dyscognition).
List of outcome measures used in clinical trials.
Number of domains and instruments used in clinical trials of 4 different medications.
Construct convergence and divergence
Mean intradomain correlation coefficients were greater than mean interdomain correlation coefficients for pain, tenderness, fatigue and depression; therefore, instruments assessing these domains demonstrated convergent and divergent validity (Table 3). For multidimensional function and sleep, the difference between mean intra- and interdomain correlation coefficients was small. For multidimensional function, this was expected given the breadth of this construct. For sleep, lack of separation could be due to the failure of treatments to improve sleep; or each of the instruments may have failed to assess facets of sleep that are of importance to individuals with FM. For example, the MOS sleep scale assesses snoring (correlation coefficient r = 0.02) and waking up with shortness of breath (correlation coefficient r = 0.18), which may be relevant for some sleep disorders but less relevant in FM. Thus sleep (despite clinical anecdotes and consensus as being of relevance to FM) did not correlate highly with PGIC and was rather insensitive to change. In some studies, a patient global rating of sleep quality based on a Likert scale was also used. It also showed a moderate correlation with PGIC (correlation coefficient: r = 0.4) as did the MOS sleep disturbance scale (PGIC: R = 0.4). These data suggest that subscales may be preferred to the overall indices on some instruments “adopted” from other medical conditions.
Convergent and divergent relationships.
Originally, measures of tenderness were mapped onto the pain domain. The instruments used included tender point count (TPC) and dolorimetry. However, the correlation coefficient between tenderness and self-reported pain scale was at best moderate (≤ 0.4) while correlation between TPC and dolorimetry correlation was high: R = 0.59). This suggested tenderness and spontaneous self-report of pain may not be measuring the same construct in FM and should be treated separately.
In summary, instruments used in these RCT to measure patient self-reported pain, fatigue, depression, physical function, and multidimensional function supported the construct validity of these instruments for use in clinical trials of FM. For sleep, the subscale but not overall index was supported. For tenderness, support can only be demonstrated in the trials of one medication in which tenderness was assessed by more than one method. For stiffness, dyscognition and anxiety, convergent and divergent validities could not be determined as these domains were measured by only one instrument in these trials.
Content validity of the domains for the core dataset
Univariate analysis showed that correlations between instruments of different domains with PGIC were moderate to high (Table 4). For depression, the mean correlation coefficient with PGIC was less than 0.5. However, in all of these clinical trials, patients with severe co-morbid depression were excluded. In addition, patients with moderate depression were also excluded in trials of 3 of these compounds. Consequently, baseline depression scores were low, reducing the effect size of these change scores.
Mean correlation coefficient between instruments of each domain with patient global impression of change.
Multivariate analyses showed moderate to high (0.4–0.67) values of R square, which was related to the number of domains assessed. In studies in which some of the potential domains were not assessed, such as tenderness, the R square value was also lower, suggesting that missing key domains will affect the overall coverage of content relevant to the condition of FM.
Regression analyses retained pain, fatigue, physical function, multidimensional function, and depression in all RCT of all 4 compounds. Tenderness was retained in all the trials of the 3 compounds in which it was assessed, further supporting inclusion of tenderness as a separate domain in the core data set. Sleep was retained in 2 of 3 possible clinical trial groups. Stiffness was retained in 2 of 4 groups and dyscognition was not retained in these regression analyses.
DISCUSSION
Data from our study and previous consensus exercises support inclusion of pain, fatigue, physical function, and multidimensional function as domains in a core data set for clinical trials in FM. Although “adopted” from other medical conditions, the instruments measuring these domains largely demonstrate characteristics supporting face, construct, content, and criterion validity in FM. A previous study has also shown that these instruments are at least moderately sensitive to changes10.
Depression is a common symptom in FM and rated as important by both patients and clinicians. This analysis showed that the correlation between depression and PGIC is only moderate. The main reason is likely that the exclusion of patients with moderate and severe depression in most clinical trials results in a low baseline depression score. Therefore, it is unlikely that any instrument would demonstrate a large effect size. Given this exclusion criterion is common in FM RCT, it seems unnecessary to stipulate its inclusion in the core data set. Nonetheless, the assessment of depression in FM is likely to be helpful in many clinical trials.
“Unrefreshed” sleep is common and thought to be pathogenically importance in FM. Moldofsky, et al showed that symptoms similar to FM could be induced by disturbing the quality of sleep in healthy normal volunteers15. Both patients and clinicians agree regarding its importance. However, to date clinical trials have used instruments not developed in patients with FM and which may not assess the type of sleep problems specific to those patients. The practice of using total indices as the sole outcome endpoints for sleep may not be ideal. Our data suggested that using the sleep disturbance subscale of the MOS sleep scale would have improved the convergent and divergent characteristics of this measure. Since sleep was retained in regression analyses in all but one group, there is a strong argument for including some element of sleep in the core data set.
Results of the current study suggest tenderness should be included as a separate domain from patient self-reported pain. Pathophysiologically, this would be logical in that it may involve different pathways. Further, it mirrors the need to assess both patient reported pain and tender joint count in rheumatoid and psoriatic arthritis. Although tender point count and dolorimetry have deficiencies such as significant interobserver variation and may not be the perfect tool, they are feasible, and current analyses showed that they contribute significantly to the content validity when added to the core data set. Hence the conclusion to include tenderness in the core data set.
For anxiety, stiffness, and dyscognition, currently, there are insufficient data from available clinical trials to support their inclusion into the core data set. Researchers interested in these outcomes should include them in assessment, but it is not justifiable to stipulate their assessment in all clinical trials of FM.
Results of our study were presented at the OMERACT 9 FM module and were the basis, along with review of the previous clinician/researcher and patient Delphi exercises, outcome measures, and disease state discussion, for development of consensus on a core domain construct for fibromyalgia16.
Acknowledgments
We thank the support from Nooshine Dayani, Qu Peng, and Robert Palmer from Forest Laboratories Inc; Chinglin Lai, Yanping Zheng, and Diane Guinta from Jazz Pharmaceuticals Inc.; Daniel Kajdasz and Amy Chappell from Eli Lilly & Co.; and Gergana Zlateva and Emir Birol from Pfizer Inc.
Footnotes
-
The Academic Department of Rheumatology is supported by an Integrated Clinical and Academic Centre grant from the Arthritis Research Campaign, UK. Dr. Williams’ participation was supported in part by Grant Number U01AR55069 from NIAMS/NIH. Dr. Arnold’s participation was supported in part by Grant Number R01AR053207 from NIAMS/NIH.