Abstract
The objective of the module was to (1) establish a core domain set for fibromyalgia (FM) assessment in clinical trials and practice, (2) review outcome measure performance characteristics, (3) discuss development of a responder index for assessment of FM in clinical trials, (4) review objective markers, (5) review the domain of cognitive dysfunction, and (6) establish a research agenda for outcomes research. Presentations at the module included: (1) Results of univariate and multivariate analysis of 10 FM clinical trials of 4 drugs, mapping key domains identified in previous patient focus group: Delphi exercises and a clinician/researcher Delphi exercise, and breakout discussions to vote on possible essential domains and reliable measures; (2) Updates regarding outcome measure status; (3) Update on objective markers to measure FM disease state; and (4) Review of the issue of cognitive dysfunction (dyscognition) in FM. Consensus was reached as follows: (1) Greater than 70% of OMERACT participants agreed that pain, tenderness, fatigue, patient global, multidimensional function and sleep disturbance domains should be measured in all FM clinical trials; dyscognition and depression should be measured in some trials; and stiffness, anxiety, functional imaging, and cerebrospinal fluid biomarkers were identified as domains of research interest. (2) FM domain outcome measures have generally proven to be reliable, discriminative, and feasible. More sophisticated and comprehensive measures are in development, as is a responder index for FM. (3) Increasing numbers of objective markers are being developed for FM assessment. (4) Cognitive dysfunction assessment by self-assessed and applied outcome measures is being developed. In conclusion, a multidimensional symptom core set is proposed for evaluation of FM in clinical trials. Research on improved measures of single domains and composite measures is ongoing.
Fibromyalgia (FM), also known as fibromyalgia syndrome, is characterized by chronic widespread pain and tenderness on physical examination, as defined by the 1990 American College of Rheumatology (ACR) criterion1. The FM criteria have been beneficial in identifying a more homogeneous group of individuals with chronic widespread pain upon which to conduct research aimed at better understanding FM. Currently, separate clinical diagnostic criteria for FM do not exist. Applying the ACR criteria in clinical practice may overemphasize the importance of tenderness (e.g., over sampling for women), the importance of peripheral as opposed to central factors, and distress (e.g., distress raises tenderness). Clinically, patients with FM often complain of other symptoms beyond pain. Additional symptoms include: fatigue, sleep disturbance, mood disturbance, cognitive dysfunction, and syndromes such as irritable bowel and bladder syndrome, and various forms of headache2. Each patient with FM experiences a number of different symptoms to varying degrees, which may change over time and with treatment, thus constituting the need for continual assessment of the multidimensional nature of the condition. FM may occur on its own and also has been noted to be comorbid with rheumatoid arthritis, lupus, other chronic pain conditions, hypothyroidism, and infections such as Lyme disease or hepatitis C. FM is prevalent in at least 2% of the population, occurring more frequently in females than in males1. Current research posits that FM results from disordered central pain and sensory processing. Disregulation of several neuropeptide and neurohormone networks have been identified, leading to a deficiency in pain inhibitory pathways and/or increase in faciliatory networks3,4. The triggering and maintenance of FM appears to require both genetic disposition and environmental influences such as emotional or physical stressors or illness5.
Until the 1990s, there had been a paucity of well-controlled clinical trials of pharmacotherapy of FM. This was partly due to a lack of classification criteria and partly related to a poor understanding about pathophysiology, uncertainty about what core symptom domains could be reliably measured, a lack of objective markers of disease activity and severity, suboptimal confidence that measures could discriminate a therapeutic response, and perhaps a certain skepticism among some that the condition was legitimate. Stemming from the work of Moldofsky and Smythe on sleep disorders in FM6, studies with tricyclic antidepressants (TCA) were conducted and showed short-term benefit for pain and sleep in FM7. However, it was apparent that these agents were incomplete in their effectiveness and poorly tolerated.
In parallel with increased understanding of the neuropathophysiology of FM, the more specifically targeted and better tolerated pharmaceutical agents of potential benefit for FM symptomatology became available. Examples of agents include serotonin and norepinephrine reuptake inhibitors (SNRI), which can augment the activity of these serotonin and norepinephrine, α2-δ subunit modulators and that inhibit excitatory neuropeptides such as glutamate and substance P, and other neuromodulators as a means of diminishing pain and fatigue, improving sleep, and beneficially affecting other symptom domains of FM. Controlled trials of several of these agents have been conducted utilizing a variety of measures and have demonstrated clinically meaningful improvements in pain, patient global impression of change, and function as compared to placebo. Two agents, pregabalin, an α2-δ modulator, and duloxetine, an SNRI, have been approved for management of FM in the United States; and a third agent, milnacipran, an SNRI, has also recently been approved8–12. However the approval process has seen a wide variety of outcome measures used, and approval has primarily been based on demonstration of efficacy in domains of pain, patient global impression of change, and total impact of FM as measured by the Fibromyalgia Impact Questionnaire (FIQ)13. There is a need for scientific validation of a core set of domains that more fully constitute FM syndrome for use in clinical trials. Performance characteristics of domain measures also need to be evaluated to assure clinicians, regulators, and the public about the soundness of our ability to evaluate therapies in FM and to provide guidance to developers of new therapies.
To this end, a group of clinician/researchers interested in FM gathered in 2004 to develop a workshop for Outcome Measures in Rheumatology Clinical Trials (OMERACT). The group included both academic and pharmaceutical-based researchers and focused on several areas. To gain a preliminary sense of the key domains needing to be assessed in FM, 23 clinician/researchers participated in a Delphi exercise based on a list of domains developed by the expert steering committee of the working group. Results of the exercise and of voting held at a workshop at OMERACT 7 are shown in Table 114.
To better understand performance characteristics of measures of the key domains, a review of controlled clinical trials was conducted to determine effect sizes of the measures15. Pain and patient global measures appeared reliable and showed good effect sizes, but other domains such as sleep, fatigue, and function did not, raising questions about either the effectiveness of therapy for these domains or the quality of the measures. The group also reviewed more objective measures being explored in FM, e.g., functional magnetic resonance imaging (fMRI), and potential linkages with patient reported outcomes developed by the World Health Organization International Classification of Function (ICF) project and the Patient-Reported Outcomes Measurement Information System (PROMIS) network15. These projects represented broader and more in-depth attempts to characterize the full patient experience of disease, function, and health-related quality of life (HRQOL) impact of FM. The Delphi exercise concurred that the research agenda should continue each of these areas of work and explore in greater depth the patient perspective on outcomes domains relevant to FM.
In preparation for a second FM workshop in 2006 at OMERACT 8, the expanding working group, with the aid of MAPI Values, an independent research organization, conducted a series of patient focus groups to map the array of symptoms experienced and problems caused by FM16. Utilizing information gained in these discussions, a Delphi exercise was conducted among patients17. The key symptom areas identified as impacting the majority of the representative patients, although worded differently, showed considerable conceptual overlap with those identified in the clinician Delphi exercise, thus providing face validity to the 2 different exercises. In addition, an updated review of the performance of outcome measures used in more recent clinical trials, objective measure data18, and linkage work with the ICF and PROMIS FM-extension projects (D Williams) was reported. The research agenda included the need to determine which key domains, as identified by patients and clinician/researchers, represented the full core set of domains experienced in FM, and whether areas of domains overlap; also a preliminary analysis was necessary to develop a responder index for FM. Two FM OMERACT Steering Committee members (L. Arnold and L. Crofford) are co-primary investigators on a project funded by the US National Institutes of Health (NIH) to develop such a responder index, which was outlined at this meeting. The core domain construct work, which was subsequently completed and which is based on the research agenda formulated at OMERACT 8, is outlined below and by Choy, et al elsewhere in these proceedings19. In addition, a more complete understanding of the symptom domain of cognitive dysfunction (dyscognition) and appropriate measures for it were identified as a key subject for the research agenda. This history provides the foundation for the current report on proceedings of the FM Workshop presented at OMERACT 9.
Objectives
We sought to (1) establish a core domain set for the assessment of fibromyalgia (FM) in clinical trials and practice, (2) review the performance characteristics of outcome measures, including patient reported outcomes, currently being used to assess FM domains, (3) discuss development of a responder index for the assessment of FM in clinical trials, (4) review objective markers of FM, (5) review the domain of cognitive dysfunction in FM and its potential assessment in clinical trials and practice, and (6) establish a research agenda for further work to be done regarding FM outcomes research.
Module process
Since the OMERACT 8 meeting in May 2004, working group members met in regular teleconferences and in person at the ACR, EULAR and Myopain Society meetings. The working group, noted above, was constituted of clinicians/researchers, statisticians, pharmaceutical industry representatives, and patients from North America, Europe, and Australia. There were 4 subgroups (leaders): (1) Domain construct (Ernest Choy, Philip Mease, Lesley Arnold, Dan Clauw, Jennifer Glass, Susan Martin, David Williams), (2) Outcome measures/patient reported outcomes (PRO)/Responder index (David Williams, Susan Martin, Lesley Arnold), (3) Objective markers (Dan Clauw, Leslie Crofford, Jessica Morea), and (4) Cognition (Jennifer Glass). Liaison to the OMERACT executive committee was conducted by Lee Simon and Vibeke Strand. The group’s fellow was Jessica Morea. Patient participants were Lynne Matallana, Kathy Longley, Michael Peterman, and Sharon Waldrop.
Methods and results by module subgroup. Domain construct
As noted above, the working group had previously conducted a clinician/researcher Delphi exercise and patient focus groups and Delphi to determine key domains considered important to assess in FM clinical trials (Tables 1 and 2). We analyzed FM trial data to determine how well these domains approximate the totality of the FM experience for a patient (content validity) and to what degree domains were overlapping versus independent. Patient global impression of change (PGIC) was used as a surrogate of overall improvement and was the dependent variable in multivariate regression analyses against which other domains were regressed. Outcome measures used in trials were mapped onto one or more of the following domains identified in the clinician/researcher Delphi: pain, patient global, fatigue, HRQOL, multidimensional function, sleep, depression, physical function, tenderness, dyscognition, anxiety, as well as stiffness, which had additionally been identified in the patient Delphi. Ten studies involving 4 pharmacological agents were analyzed: 2 serotonin and norepinephrine reuptake inhibitors (i.e., duloxetine and milnacipran), one α2-δ modulator (i.e., pregabalin), and sodium oxybate (gamma hydroxybutyrate) — all of which have shown efficacy in FM clinical trials. Details of this study are summarized elsewhere in these proceedings19.
Univariate analysis showed that instruments that measured these various domains showed moderate to high correlation with PGIC; associations were highest with pain, fatigue, multidimensional function, physical function, and stiffness; and only moderate with depression, anxiety, and dyscognition. It should be recalled that in a majority of these trials, patients with major depressive disorder had been excluded, resulting in a lower effect size of change scores since baseline depression scores were low. In addition, only one trial utilized a measure of self-assessed cognition, partly because of uncertainty about how best to approach and assess this domain.
Multivariate analysis showed moderate to high values of R2, with studies having more non-overlapping domains demonstrating higher values, suggesting that if key domains are not assessed, the variance accounted for in PGIC will be diminished. Pain, fatigue, physical function, multidimensional function, and depression were retained as separate domains in trials of all 4 compounds. Tenderness was retained as a domain separate from pain in all 3 trials in which it was assessed, suggesting that it is a sign of allodynia and/or hyperalgesia separate from the subjective impression of pain. Sleep was retained in 2 out of 3 clinical trial groups; stiffness, assessed by a single question in the FIQ, in 2 out of 4; and dyscognition in none, the latter presumably related to either non- or insufficient assessment.
The domain construct was discussed in breakout sessions, taking into account clinician and patient Delphi exercises and data analysis, and as aided by the presence of patient participants. Voting, by audience response methodology, on the construct was done on 2 occasions: at the time of the module and in the plenary recap at the end of the meeting, when further clarification on key issues was offered.
Inner core set (domains to be assessed in all clinical trials of FM)
There was little debate about whether core issues such as pain, fatigue, and patient global should be measured in all FM trials as relevant domains for the “inner” core set (Table 3). However, there was considerable discussion about other domains. One issue was the separation and overlap of the concepts of multidimensional function, physical function, and quality of life. The 2 principal instruments currently used for measuring these domains are the Medical Outcome Study Short Form Survey-36 (SF-36) and FIQ, which include both function and HRQOL questions. Work is under way to develop more sophisticated instruments that more comprehensively measure these domains through the PROMIS and PROMIS FM-extension project, and/or linkage with the ICF methodology. Since the current measures are primarily considered to be optimal instruments to assess the concept of multidimensional function, it was voted (63%), until more optimal HRQOL instruments are available, to subsume these concepts under the phrase “multidimensional function,” which was voted to be a core domain item, keeping open the possibility of separating out HRQOL as more sensitive and specific instruments are developed. Tenderness separated from pain in the multivariate analysis and was considered by more than 60% in initial voting and 70% in revised voting to be in the core set as an essential domain to measure in all trials. Sleep disturbance has long been considered an important part of the FM experience, and was so endorsed in the clinician and patient Delphi exercises. However, in the data analysis, it did not correlate highly with PGIC and was somewhat insensitive to change. More careful analysis of the instruments used to assess sleep demonstrated that some subscales performed well and others, e.g., “snoring” in the Medical Outcomes Study (MOS) sleep scale, did not. Thus, the poor correlation with PGIC could have been due to dilution of quality of the scale by assessments that were irrelevant to FM patients. It was agreed that there should be a focus on development and testing of more relevant measurements of sleep in FM and use of more sensitive subscales of existing measures. Thus, with further discussion, it was voted to include sleep disturbance in the inner core (Table 3).
Outer core set (domains to be assessed in some but not all FM trials — second circle)
Some domains were shown to be core domains in FM by the multivariate analysis but not considered by the majority of OMERACT attendees to be necessary to assess in all clinical trials in a development program. Depression was retained in the multivariate analysis as a core domain in FM and was voted, by 65%, that it should be assessed in some trials; but only 35% felt it should be assessed in all trials. Thus, in Figure 1 depression is listed in the second circle. Cognitive dysfunction, or dyscognition, was noted to be an important domain by patients versus less important as rated by clinicians/researchers in previous Delphi exercises. However, full understanding of depression as a domain and how best to assess it in FM trials is still uncertain and is an active research issue. Given its importance as a domain, 38% felt it should be in the core set and 45% thought that it should be measured in some trials. Thus, dyscognition was moved to the second circle (Figure 1).
Research agenda (domains that may or may not be included in FM trials — outer circle)
Several domains were high-lighted in discussions as being of potential interest to further explore; these are listed in the third circle. Stiffness has been identified by patients as an important symptom domain. In multivarate analysis it did not separate out in all trials as a domain distinct from pain, although it was only assessed with a single question in the FIQ. Thus, it is part of the research agenda (outer circle). Functional imaging and cerebrospinal fluid biomarkers are examples of potential objective markers that may be important and discriminative, although not currently feasible for all trials. These were, therefore, listed in the research agenda. Because anxiety was considered to be an essential part of the core set by just 18%, it was placed in the outer circle.
In previous FM workshops, adverse events (AE) were listed as an important domain to assess in trials. Since AE are naturally assessed in clinical trials, it was felt to be unnecessary to list as a symptom in the core set.
Outcome measures/PRO
The outcomes measurement (OM) committee within the FM Working Group of OMERACT works to identify PRO that best assess the domains of most relevance to individuals with FM. The work of this group is informed by ongoing initiatives either within or outside OMERACT, and at OMERACT 9 this group presented data in the following areas: (1) Building evidence supporting the valid use of existing PRO specifically for FM; (2) Developing responder indices based upon existing PRO; (3) Further refinement of the domain definitions of relevance specifically for FM; (4) Developing new and next generation outcomes measures specific to FM; and (5) Integrating the guidance of regulatory bodies to the work of improved outcomes measurement in FM.
Studies supporting the valid use of existing PRO in FM
Many of the outcomes measures currently used in FM research were developed and validated for use with other medical conditions. Thus many indices used to assess domains of relevance were “adopted” from other conditions. Adopting instruments is neither uncommon nor inappropriate when exploring a relatively new and poorly understood condition such as FM. For example, a research definition for FM did not exist prior to 1990 and until the recent work within OMERACT, there was no consensus regarding the clinical domains of relevance to this condition. A lack of basic foundations in the understanding of this condition, not to mention insufficient interest, time and funding, precluded developing assessment instruments more specific to FM. Borrowing and adopting assessment instruments has facilitated basic exploration of the nature and impact of FM, and represents a methodological advance over previous un-standardized methods of inquiry.
As interest and understanding of FM matures, the need for greater rigor in assessment methods also advances. It is plausible to suspect that “adopted” instruments are suitable for use in FM; however, support of this suitability needs to be based upon performance within individuals with FM rather than upon assumptions of equivalence. Several studies are currently underway examining the performance characteristics of validated instruments in other conditions for use in studies with FM. An example of one such effort is the ongoing work of Strand and colleagues on use of the SF-36 in FM. Importantly, as with other rheumatic diseases, the SF-36 represents a generic measure of health-related quality of life (HRQOL) that meets the OMERACT filter in rheumatoid arthritis, osteoarthritis, systemic lupus erythematosus, and FM, and may be well suited for use with other disease specific instruments, once developed.
The SF-36 is a brief, well established, self-administered patient questionnaire for assessment of HRQOL that can also be viewed as a measure of multidimensional function, including “participation”20. The SF-36 measures 8 domains of health status: physical functioning, role limitations because of physical problems, bodily pain, general health perceptions, energy/vitality, social functioning, role limitations due to emotional problems, and mental health. A summary score for physical functional status (physical component score, PCS) can be calculated by combining and weighting the various individual scales21. Individual or group domain and summary scores can be compared to national norms for the US and other populations, or contrasted for various medical conditions22.
To date, the SF-36 has been used in over 70 studies involving individuals with FM; including randomized controlled trials (RCT) of tramadol, gabapentin, pregabalin, duloxetine, and milnacipran. The domains of coverage within the SF-36 map nicely with the domains identified as being relevant in the aforementioned Delphi studies. Domain scores have been consistently observed to improve in studies where active treatment arms can be compared to placebo; supporting the SF-36 as being responsive to change in individuals with FM when change is expected to occur. To date, there is much evidence supporting the use of the SF-36 as an index of multidimensional function as it satisfies the OMERACT filter for FM. Of interest, data from both RCT and longitudinal observational studies demonstrate remarkably similar decrements in baseline domain and PCS and mental component summary scores compared with age/gender and population matched normative data. Trials of gabapentin, pregabalin, duloxetine, and milnacipran have demonstrated treatment associated mean improvements in summary and domain scores that are remarkably similar and well exceed minimum clinically important differences (MCID)9,23–26.
Responder indices using combined domains
Responder indices have become popular for identifying treatment successes in illnesses where improvement needs to occur across multiple domains. Such responder indices have a history of use in FM. However, consistent with the work on relevant domains, there has not been consensus regarding composition of these indices. For example, Simms, et al27,28 reported on the use of an index requiring improvement on 4 out of 6 criteria defined as 50% improvement in pain, sleep fatigue, patient global and physician global, dolorimeter improvements and improved myalgic score. These criteria were later used in RCTs of amitriptyline29. This initial response index for FM was a first attempt beyond assessment of pain and tenderness in clinical trials. However, the Simms criteria were not as sensitive as would be desired in part because physical function was not included. A second attempt at a responder index for FM was the work of Dunkl et al30, requiring improvements in 3 of 4 measures including FIQ, pain intensity, tender point count, and pain intensity.
Clinical trials of new compounds for FM have also used responder indices as primary efficacy endpoints. For example, in RCTs of milnacipran, a responder index required participants to report ≥ 30% improvement in pain intensity, a patient global change of “moderately improved or much improved” and ≥ 6 points improvement in SF-36 PCS score12. Clinical trials of sodium oxybate used a different responder index: ≥ 20% improvement in pain intensity, ≥ 20% improvement in FIQ, and a patient global assessment of “much better or very much better”31.
Thus, to date, most responder indices have been rationally derived, based upon what investigators or regulatory bodies deemed to represent improvement in the context of a clinical trial involving FM. Given consensus regarding relevant domains is only just evolving, most responder indices have not benefited from a data driven development process. Arnold, Crofford, and colleagues are currently working on the NIH/National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) sponsored project to develop a responder index for FM based on both consensus and empirical data for eventual use in FM RCT. This project begins with the consensually derived domains for FM, links existing assessment instruments to each domain, evaluates each measure for 5 types of validity in FM, and evaluates the performance of each instrument as a member of a composite index. The project also establishes consensus among clinicians regarding criteria for improvement in FM, tests the consensually derived criteria with empirical data, and identifies which definitions of improvement result in fewest placebo improvements. This project is ongoing and will inform the efforts of the working group in subsequent sessions of OMERACT.
Refinements in domain definition studies (Item-level refinements)
Identification of consensually derived domains of relevance for FM is an important first step in gaining a better understanding of what needs to be assessed in FM. However, studies that attempt to validate adopted measures for use in FM must rely on several assumptions. First, instruments purporting to measure a given domain (e.g., fatigue) will in fact measure those facets of fatigue that are relevant to individuals with FM. Second the domain names (e.g., fatigue) have shared meaning for individuals with FM, clinicians, and other medical populations from which existing measures may have been adopted. Early investigations into this area of inquiry suggest that neither assumption holds completely.
Identifying concepts contained within existing domain measures
Perhaps the largest body of work in this area comes from the investigators associated with The International Classification of Functioning, Disability, and Health project (ICF) within the World Health Organization (ICF-WHO). The ICF developed a domain categorization coding system that identifies the relevant domains of functional status assessment for medical illnesses in general32. This large system can be broken down into core sets for specific illnesses. Currently the core set closest to FM is the “chronic widespread pain core set (CWP).” CWP affects between 5 and 15% of the population and includes FM as an extreme subset33–36. When used, this coding system helps to identify relevant domains of functional limitations for different diseases/conditions and then provides a code (much like the ICD10) that identifies the area of functioning affected by the condition.
Various standardized instruments used to assess domains in FM have been examined and items within each instrument have been mapped to specific categories (subdomains) within each broader domain (e.g., fatigue can be subcategorized into physical, mental, motivation, etc.). One recent study found that out of 42 RCT in FM, 27 different questionnaires were used to assess FM. From the 27 different questionnaires, 1138 distinct health-related concepts could be identified based upon items. These concepts were linked to 113 ICF categories. Each questionnaire differed greatly from the other with regard to the specific subdomain categories covered and the relative importance paid to the broader ICF domains of body structure, body function, activities/participation, and environmental factors. The least well covered broad domain for all existing questionnaires was environmental factors37.
A second manuscript explored differences in the ICF categories that were represented in PRO commonly used in FM research that purportedly assess the same construct. This manuscript applied ICF linkages to common indices of pain, fatigue, sleep function, and affect. In each case the domains were indexed by assessment tools that varied substantially, depending upon which assessment tool was chosen. Thus, quite disparate conclusions might be found for a given construct, based on which assessment instrument was used and which specific facets of the construct the instrument and its scales emphasize38.
That different instruments emphasize different facets of constructs is not always limiting. As we learn more about how patients with FM define and think about the various domains of relevance, we will be better able to match our assessment instruments to the way individuals with FM use these terms resulting in improved assessment ability with increased sensitivity in our measures of outcomes. That different instruments assess different facets of domains is also reason not to limit by decree which assessment tools must be used for FM, as the choice of instrument might be best driven by which facets of the domain a given intervention hopes to address.
Identifying FM-specific definitions to the domains of relevance
Efforts to learn about how patients with FM think about the domains of relevance are currently in progress. Methodologies typically start with a consensually derived generic definition of the domain (e.g., ICF definitions or definitions from the NIH Roadmap PROMIS project) that are then agreed to or modified by focus groups of individuals with FM.
One such study recently presented at an NIH PROMIS conference found generally good agreement among patients with FM with generic definitions of pain, fatigue, negative mood, and physical functioning. For each domain, however, insufficient depth of impact was expressed as a concern of the definition. For example, individuals with FM reported that most existing definitions of fatigue focused on simply being tired and failed to capture the profound unrecoverable and disabling exhaustion that accompanies FM39.
Development of new outcomes measures specific to FM
Perhaps the largest scale project aimed at developing new highly sensitive FM-specific measures for the domains of relevance to FM is the NIH/NIAMS sponsored project “FM-Specific extension of the PROMIS network.” PROMIS is an NIH Roadmap initiative that is building a next generation Patient-Reported Outcomes Measurement Information System (PROMIS). In the development of PROMIS, each domain is defined generally, and then patient reported outcome measures are developed and linked to those specific domain definitions. PROMIS, still under development, is to be a publicly available user-friendly computerized adaptive testing (CAT) system that for efficient generic measurement of PRO outcomes (PRO) across a wide range of chronic diseases and dimensions39. Although costly and time-consuming to develop and maintain, a national public resource of this nature will be of benefit as the system will be able to assess multiple domains using fewer items (i.e., less patient burden) with greater precision (i.e., increased power for clinical trials with fewer subjects).
PROMIS was established for the general assessment of chronic illnesses, and as might be expected, many of the domains identified in PROMIS are of relevance to FM, such as pain, fatigue, negative mood, and physical function. Several domains identified in the OMERACT Delphi exercises, however, were not included in the first iteration of PROMIS, such as sleep disturbance, dyscognition, stiffness, and tenderness. Williams and colleagues are currently participating in a cooperative agreement with the US NIH/NIAMS to develop a FM-specific extension of PROMIS. The goals of this initiative include: (1) Determining whether PROMIS definitions of pain, fatigue, physical function, and negative mood hold up or require modification for patients with FM; (2) Developing new definitions for sleep disturbance, dyscognition, stiffness, and tenderness for FM, (3) Developing new item banks for new domains and/or supplementing existing banks with FM-specific items; (4) Performing large-scale field testing following the methods of the larger PROMIS initiative thus facilitating the development of FM-specific calibrations for existing and new item banks, and (5) Developing static short forms and CAT assessments specific to domains of relevance for FM. These new item banks and calibrations will be merged within the context of the larger PROMIS roadmap initiative.
Regulatory considerations concerning the use of PRO for FM
Many of the domain assessment tools currently in use for FM were developed in academia to explore and gain a better understanding of FM. With broader interest and new treatments for FM, researchers and PRO developers must become aware of not only methods of test development but also guidelines of regulatory bodies such as the US Food and Drug Administration (FDA) and the European Medicines Agency (EMEA) if the assessment device is to be used in a clinical trial for eventual product approval. One such regulatory body, the FDA, released valuable draft regulatory guidance for (1) the use of existing measurement tools, (2) the development of new measurement tools, and (3) the transition of tools from one medium to another (e.g., paper to electronic formats)40. Of particular importance in the draft guidance is the documentation of patient input during the PRO instrument development process, both in the identification of the domains of importance in any particular disease area as well as at the item-level development and evaluation. Understanding the current PRO instrument requirements from the perspective of regulatory bodies, can guide decisions related to choice of currently available instruments versus development of new instruments.
OBJECTIVE MARKERS
Participants at OMERACT 9 were presented an update on the current understanding of the underlying pathophysiology of FM and the biomarkers that relate to these pathophysiological processes. Researchers and clinicians now view FM as a common pain syndrome characterized by primarily central, non-nociceptive pain; as well, a variety of aberrant pain and sensory processing pathways have been identified that can lead to pain or sensory amplification.
All potential biomarkers that have been identified to date in FM are related in some way to this central amplification. Current biomarkers under study include but are not limited to experimental or evoked pain testing (EPT); MRI imaging (sometimes during EPT); and levels of neurotransmitters in cerebrospinal fluid, including substance P, glutamate, serotonin, and norepinephrine; muscle biopsy; polysomnography; cytokines; and sensory testing.
The objective biomarker breakout session focused on 3 main issues: (1) the “objectivity” of biomarkers, (2) whether a marker belongs in the core domain of outcomes that must be measured in a clinical trial or requires further study before becoming a core domain, and (3) application of the OMERACT filter of truth, discrimination and feasibility to specific biomarkers.
Neurotransmitters and muscle biopsy were the only markers designated as totally objective, but no single marker was designated as a core domain. When applying the OMERACT filter, some markers were considered more useful in research than in clinical practice. For example, polysomnography was considered truthful and discriminating but might only be feasible in a clinical trial where the investigational intervention aims to improve sleep, and may not be feasible in clinical practice. In the case of biomarker neural imaging, some participants rated it 7 out of 10 in terms of truth but rated it low on feasibility due to cost, and evidence was considered insufficient to assign a score on the discrimination scale.
The goal of OMERACT is to place a set of disease markers through a filter of truth, feasibility, and discrimination to achieve a succinct and practical set of outcomes to measure change in health status. With numerous available markers measured in so many different ways, it is impossible to compare the efficacy of potential treatments. In preparation for OMERACT, and during the workshop, it became evident that there were too many biomarkers with too little evidence to support existence of a core set that would pass through the OMERACT filter. As such, the biomarker research agenda focused on a single class of biomarker that has the most support for feasibility, truth, and discrimination: EPT.
Encompassing multiple techniques, including tender point intensity, pressure pain thresholds, and heat/cold thresholds EPT is emerging as a promising evidence-based biomarker. The goal is to quantify the experience of pain objectively and to demonstrate that FM is related to aberrations in central, rather than peripheral, pain processing. The presence of hyperalgesia (increased pain in response to normally painful stimuli) and allodynia (pain in response to non-painful stimuli) implicate central pain mechanisms and are measured by EPT.
Research shows that some methods of EPT are correlated with reports of clinical pain in patients with FM. For example, Geisser, et al found that dolorimetry and pressure thresholds were associated with clinical pain, but heat stimuli were not41. Particularly, the use of the multiple random staircase (MRS) method for delivering pressure stimuli has been shown to be associated with patients’ reports of clinical pain. MRS uses an interactive software system to determine low, medium, and high pain intensity thresholds for each subject based on their response to random stimuli. Harris, et al compared MRS to other evoked pain measures and found that it was the only “objective” technique that tracked with improvement during the course of treatment42. Such findings may indicate that experimental pain testing, and MRS specifically, correspond to a patient’s clinical condition, rendering this type of testing a potential biomarker of disease status, progression, and improvement. In addition, MRS is not subject to bias in terms of variation between clinicians, or to fluctuations within individual clinicians, as with tender point counts, and is not associated with patient distress43,44. With both dolorimetry and tender point count, the patient is aware of when the stimulation is forthcoming45, and such techniques have been shown to be influenced by patient distress46. EPT in FM yields a measure of objective pain that correlates with clinical pain, is less subject to bias, underscores the central pain mechanisms in FM, and is less invasive than other biomarkers (e.g., collecting cerebrospinal fluid; CSF).
COGNITION
Although pain and fatigue are hallmark symptoms of FM, many patients find that problems with cognitive function (dyscognition) are just as troublesome5,15,47,48. A small but growing body of literature supports the presence of dyscognition in FM49. In this section the current state of knowledge about dyscognition is reviewed.
Measurement of dyscognition can be divided into 2 categories: self-report of cognitive difficulties, and performance-based measures of cognition; most reports are performance-based49. About one dozen studies have been published that use either standardized neuropsychological tests or non-standardized but common measures from cognitive science. Although these studies have used a variety of measures, a pattern has begun to emerge where deficits are seen in 4 separate cognitive systems. Most notably, problems with verbal working memory have been consistently reported. Working memory refers to a memory system that combines short-term storage (on the order of seconds) with other mental operations such as retrieving knowledge from semantic memory and deleting or adding items. Working memory is an important construct in cognition as it functions as basic skill. Results from 4 different measures of working memory, the Paced Auditory Serial Attention Test (PASAT)50–52, the Reading Span Test53, the Everyday Test of Attention54, and Consonant Trigrams52 have all found impairment on this crucial cognitive system.
Related to working memory are attention and executive control. Attention is the ability to maintain focus on a specific item, task, or location. Executive control involves the many processes used to maintain focus, such as ignoring irrelevant items, suppressing responses not consistent with a goal, and planning. The results from the PASAT and the Test of Everyday Attention point to a problem with executive control of attention in FM. Ongoing work indicates greater memory impairment in FM patients when they have distraction52,55,56. An important point is that most standardized neuropsychological tests are conducted without distraction.
Deficits are also seen in memory systems with longer duration. Episodic memory refers to the ability to remember a specific episode. Many of our memory tasks fall into this category, such as remembering a list of items to buy at the grocery store. Patients with FM perform more poorly than controls on word list tasks53 as well as standardized tests of memory51,52,57,58.
The final area where deficits have been reported is in semantic memory, particularly the ability to access semantic memory. Semantic memory refers to our knowledge of facts. It is separate from episodic memory (e.g., you may remember that there are 12 inches in a foot, but not remember when you learned this fact). Patients anecdotally report word finding problems, and there are reports of decreased performance on both verbal fluency tasks53,57 and on vocabulary tests53.
There are now a number of computerized neuropsychological batteries (e.g., CANTAB, COGSTATE). Computerized batteries would help the ease of testing, data collection, and interpretation across clinical trials and other studies. To date, there is only one report that used a computerized battery, the Automated Neuropsychological Assessment Metrics59. Unfortunately, this battery did not yield any differences between patients with FM and controls, perhaps due to the lack of distraction and working memory tests. Future work will be needed to assess the utility of other computerized neuropsychological batteries in FM research.
Self-report of cognitive function is an important addition to performance-based measures because it can be influenced by many factors, including effort required for performance, stress regarding performance, and depression. There is a surprising paucity of studies using self-report instruments of dyscognition in FM, although several studies include 1 or 2 items about memory or concentration. An exception is a study of memory beliefs in FM with the Metamemory in Adulthood Questionnaire, used frequently for studying memory in older adults55. FM patients reported lower memory capacity, more memory deterioration, low self-efficacy over memory, higher anxiety about memory performance, and more strategy use to support memory than in age and education matched controls. Among FM patients, performance on a memory task was correlated with perceived memory capacity. Further work using other well-validated, self-report measures of cognitive function would be very helpful in clinical trials, since self-report measures are easy to administer and fulfill the need for patient reported outcomes.
To summarize, existing data support dyscognition as a salient symptom, and objective cognitive impairments can be demonstrated in patients with FM. This will be important in future clinical trials, but the field is not yet at the point where we can recommend outcome measures that should be included in all trials. In addition, during breakout discussions, 3 important areas that have not been well studied were identified: (1) There was considerable concern about how other conditions (e.g., depression, anxiety, fatigue, and medications) could influence dyscognition; (2) Some aspects of dyscognition described by patients have not been well studied, in particular the idea of mental exhaustion and feelings of dissociation; and (3) There was a good deal of discussion about the frequent lack of correspondence between objective cognitive testing and self-report of dyscognition. The group noted that self-report may also include other noncognitive aspects. For example, someone with cognitive losses compared to pre-illness state may still perform well when their pre-illness state was above average.
CONCLUSION
FM is a condition characterized by chronic widespread pain, excessive tenderness, and a number of associated symptoms such as fatigue, sleep disturbance, mood disorder, and cognitive dysfunction with associated impairment of function and HRQOL. The symptom complex is caused by dysregulation of central sensory processing systems. Evidence points to genetic, environmental, and concomitant disease state factors in its etiology. As therapies are developed that not only address pain, but also other symptom domains, clinicians, regulatory agencies, patients, and others need to know the relative contribution of these various domains to the disease experience of the patient and how best to measure them in a reliable and feasible manner in clinical trials.
The primary objective of the OMERACT 9 FM module to achieve relative consensus on a domain construct for FM clinical trials was accomplished through: (1) Review of work presented in previous OMERACT workshops (clinician/researcher Delphi, patient focus group and Delphi exercises); (2) Presentation of a study in which the key clinical domains identified in these exercises were mapped against the patient global impression of change noted in 10 FM pharmacologic studies to determine the degree to which key domains both constituted the global patient experience of FM and were not completely overlapping; (3) Presentation of the current status of outcome measures, objective biomarkers, and understanding about disease state; (4) Discussion of the above in breakout groups; and (5) A voting process. Figure 1 demonstrates the outcome of this process. Domains considered essential to measure in all FM clinical trials include pain, tenderness, fatigue, patient global, multidimensional function, and sleep disturbance. Domains considered important to measure at some point in a clinical development program, but not essential to measure in all clinical trials, are depression and cognitive dysfunction, also known as dyscognition. Domains that are of research interest and considered elective to measure at this time, include stiffness, anxiety, and objective markers such as functional imaging, e.g., fMRI, and cerebrospinal fluid biomarkers. It is well recognized that this domain construct is a “work in progress.” For example, it is recognized that there are important elements of HRQOL that are not necessarily subsumed under the concept of “multidimensional function,” yet the best instruments that we currently have available, the SF-36 and FIQ, to measure these domains are primarily measures of function. Further, whereas both clinical experience and emerging research suggest that cognitive dysfunction is an important clinical domain in FM, optimal assessment methods are still in development; thus, whether ultimately cognitive dysfunction will be considered a more essential domain to measure in all trials is uncertain. As new and more sophisticated instruments become available to more completely measure the totality of patient experience vis-à-vis these domains, and as we gain a more full understanding of the disease process and better ways to measure impact of therapeutic intervention, this framework is expected to evolve.
As in previous OMERACT meetings, an update was provided on the outcome measures used in FM trials and the current status of objective markers of FM disease state. The quality of their performance was discussed, and areas needing improvement were reviewed, particularly assessment of sleep, mood disturbance, tenderness, stiffness, multidimensional function and HRQOL.
Since the majority of outcome measures are PRO, it is important that they fulfill the standards of evidence being developed by regulatory agencies. The working group reported on several projects underway, which will be more fully reviewed in future OMERACT meetings as part of the group’s research agenda: linkages with existing disease assessment networks such as PROMIS and the ICF and the development of an FM responder index. These will be developed in the context of the OMERACT filter of truth (forms of validation), discrimination, and feasibility. Objective markers of FM disease state continue to be developed, such as cerebral spinal fluid biomarkers and functional imaging. The relationship of these markers to disease state and their ability to reflect change in response to treatment remains on the research agenda.
Special focus was placed on the domain of cognitive dysfunction in FM during the module. This domain is ranked highly by patients in terms of disease impact, and understanding about this problem is emerging. There have been fledgling attempts to measure change of this domain via self-assessment questionnaires. There are a number of more objective and potentially feasible applied measures, e.g., computer based cognition assessment methods, which are beginning to be studied in FM clinical trials and will be reviewed at future OMERACT meetings.
Acknowledgments
We thank the support from Nooshine Dayani, Qu Peng, and Robert Palmer from Forest Laboratories Inc, Chinglin Lai, Yanping Zheng and Diane Guinta from Jazz Pharmaceuticals Inc., Daniel Kajdasz and Amy Chappell from Eli Lilly & Co., and Gergana Zlateva and Emir Birol from Pfizer Inc. OMERACT FM working group: Robert Allen, Dennis Ang, Lesley Arnold, Annelise Boonen, Daniel Buskila, Larry Bradley, Alarcos Cieza, Ernest Choy, Dan Clauw, Leslie Crofford, Brian Cuffel, Michael Gauthier, Michael Gendreau, Jennifer Glass, Don Goldenberg, Richard Gracely, Diane Guinta, Kim Jones, Chinglin Lai, Geoff Littlejohn, Yves Mainguy, Susan Martin, Lynne Matallana, Philip Mease, Jamal Mikdashi, Jessica Morea, Robert Palmer, Daniel Radecki, I. Jon Russell, Stuart Silverman, Lee Simon, Michael Spaeth, Tanja Stamm, Raj Tummala, Olivier Vitton, Brian Walitt, David Williams, Madelaine Wohlriech, and Gergana Zlateva.