Abstract
Objective. To test the OMERACT 8 draft validation criteria for soluble biomarkers by assessing the strength of literature evidence in support of 5 candidate biomarkers.
Methods. A systematic literature search was conducted on the 5 soluble biomarkers RANKL, osteoprotegerin (OPG), matrix metalloprotease (MMP-3), urine C-telopeptide of types I and II collagen (U-CTX-I and U CTX-II), focusing on the 14 OMERACT 8 criteria. Two electronic voting exercises were conducted to address: (1) strength of evidence for each biomarker as reflecting structural damage according to each individual criterion and the importance of each individual criterion; (2) overall strength of evidence in support of each of the 5 candidate biomarkers as reflecting structural damage endpoints in rheumatoid arthritis (RA) and identification of omissions to the criteria set.
Results. The search identified 111 articles. The strength of evidence in support of these biomarkers reflecting structural damage was low for all biomarkers and was rated highest for U-CTX-II [score of 6.5 (numerical rating scale 0–10)]. The lowest scores for retention of specific criteria in the draft set went to criteria that refer to the importance of animal studies, correlations with other biomarkers reflecting damage, and an understanding of the metabolism of the biomarker.
Conclusion. Evidence in support of any of the 5 tested biomarkers (MMP-3, CTX-I, CTX-II, OPG, RANKL) was inadequate to allow their substitution for radiographic endpoints in RA. Three of the criteria in the draft criteria set might not be required, but few omissions were identified.
Radiographic damage scoring systems are the gold standard for assessing structural damage outcomes in rheumatoid arthritis (RA), psoriatic arthritis (PsA), and spondyloarthritis (SpA). However, with the introduction of highly effective biological therapies, it is now desirable to identify patients at risk of joint damage prior to the appearance of radiographic change. Recent work has suggested that several soluble biomarkers (biomarkers measured in body fluids), primarily those reflecting tissue remodeling in joints, are independent predictors of joint damage in RA. As the level of a biomarker, or particularly the short-term change in the level, may predict radiographic progression, these markers may constitute indicators of early response to disease modifying agents in clinical trials, and may also be useful to the clinician managing individual patients.
At OMERACT 8 a special interest group (SIG) was assembled comprising individuals with a special interest in biomarkers and structural damage outcomes, to develop validation criteria for a soluble biomarker to substitute for radiographic outcome measures in clinical trials. A list of 14 validation criteria was generated (see Appendix 1) and structured according to the key requirements of the OMERACT filter for validation of an outcome measure: truth, discrimination, and feasibility1,2. The performance of the criteria was initially examined using the example of C-reactive protein (CRP)3. This exercise showed that some of the criteria, particularly those itemized under the category of truth, were regarded as comparatively less useful in the validation process. However, CRP is regarded as an indirect marker of joint inflammation rather than a marker of joint tissue remodeling, and its association with radiographic damage appears to be rather weak. The OMERACT 9 soluble biomarker working group, therefore, decided that the next step was to test the criteria using other biomarkers considered to be high priority candidates, particularly those that might reflect joint remodeling. The 5 biomarkers identified by the group were C-telopeptide of type I collagen (CTX-I), C-telopeptide of type II collagen (CTX-II), metalloproteinase 3 (MMP-3), the receptor activator of nuclear factor-κB ligand (RANKL), and osteoprotegerin (OPG), which were chosen for this exercise after meeting 2 criteria: (a) evidence that the biomarker is directly related to joint tissue remodeling; and (b) availability of published data evaluating the association between the biomarker and radiographic damage in RA.
The aims of this study were to test the performance of the OMERACT 8 validation criteria by assessing the strength of evidence (SOE) from the literature in support of 5 candidate biomarkers as reflecting structural damage in RA, to appraise the importance of inclusion of each criterion, and to identify omissions to the criteria set as a prelude to the drafting of revised criteria.
METHODS
Literature search
A systematic literature search focusing on the 5 chosen biomarkers: serum RANKL, serum OPG, serum and urine CTX-I, urine CTX-II, and serum MMP-3, specifically directed towards each individual criterion, was conducted by the fellow in this working group (SWS). The MeSH terms used for the biomarkers were: RANK ligand/osteoclast differentiation factor/Receptor Activator of Nuclear Factor-kappa B, osteoprotegerin, matrix metalloproteinase 3. For CTX-I and CTX-II there were no specific MeSH terms. The search, therefore, included: CTX-I, collagen type I, C-terminal telopeptide, CTX-II, and collagen type II. Searches were performed in MEDLINE and EMBASE in April 2007 with no date restriction. The search was limited to English language and peer-reviewed journals. Eligible studies were regarded as all studies addressing the individual items of the criteria set (see Figure 1 for a description of article retrieval). References of articles read in full text were also examined. A survey was mailed to companies manufacturing assays for the 5 biomarkers to obtain information on unpublished literature. Study quality was not assessed by the fellow. However, the main study characteristics and findings (i.e., author, year, journal, study design, duration of study, study population, outcome measure, number of participants, and strength of the relevant effect measure) were presented to the group both orally (EULAR 2007) and as a written summary for evaluation of SOE.
Rating the strength of evidence supporting the biomarkers as reflecting structural damage in RA and the strength of recommendation in support of including each criterion
Two electronic voting exercises were conducted by Web survey. The primary aims of the first exercise were: (a) To examine the SOE for each biomarker as reflecting structural damage according to each individual criterion; and (b) To appraise the importance of individual criteria. The results of the literature search were therefore organized so that the evidence for all 5 biomarkers was compiled and presented according to individual criteria before the voting questions were presented. After each criterion had been reviewed, the members of the group (n = 19) were asked to rate SOE on a 0–10 numerical rating scale (NRS; 0 = no supporting evidence at all, 10 = unequivocal evidence) in response to the following question: “Please rate to what degree you consider the available data from the literature as supporting (biomarker) as reflecting structural damage in RA according to this specific criterion.”
Participants were then asked to vote on the following question on a numeric rating scale of 0 (definitely exclude) to 10 (definitely include) to determine the strength of the recommendation in support of including each individual criterion in the draft criteria set: “Please rate to what degree you consider this a required criterion in the validation of a biomarker reflecting structural damage endpoints in RA.”
The primary aim of the second voting exercise was to examine the overall SOE in support of each of the 5 candidate biomarkers as valid biomarkers reflecting structural damage endpoints in RA and to identify omissions in the draft set as a prelude to the drafting of revised criteria. The group members were presented with the same literature review but the results were organized so that all the evidence was presented for each biomarker before the following voting question was presented: “Please rate on a scale of 0 (no supporting evidence) to 10 (unequivocal evidence) to what degree you consider (biomarker) as a valid biomarker reflecting structural damage in RA after a consideration of the entire literature addressing all 14 draft criteria.” Group members were also asked to respond to the following question by providing written feedback: “What further information not addressed by the draft criteria is required to support (biomarker) as a valid biomarker reflecting structural damage in RA?”
Results of the voting exercises are provided as means (standard deviation).
RESULTS
Literature search and first voting exercise
A summary of the findings has been organized under the 3 domains of the OMERACT filter. Table 1 shows the results of the voting exercise addressing the SOE for each biomarker as reflecting structural damage according to each specific criterion. Table 2 shows the results of the voting exercise addressing the strength of recommendation in support of including each individual criterion in the draft criteria set. MeSH terms used in the literature search for each criterion and more detailed findings of the search are reported in Appendix 2.
Truth (Criteria 1–5).
The systematic literature search focusing on the 5 criteria itemized under the category of truth revealed limited documentation and SOE was accordingly rated as low. Studies describing an association between the biomarker level and structural damage in established animal models of arthritis (Criterion 1) were only found for CTX-II4,5. All biomarkers were reported as being immunohistochemically localized to joint tissues (cartilage, bone, synovial tissue) (Criterion 2), although most are neither sensitive nor specific for target of joint tissue origin (Criterion 3), with the exception of CTX-II, which is a specific marker for type II collagen in hyaline cartilage6,7. The relation of the biomarker to synthesis, degradation, and turnover of joint tissue components (Criterion 4) has been well characterized for OPG, RANKL, and MMP-3, while the relation of CTX-I and CTX-II to joint degradation is not as well documented. With the exception of one cross-sectional study, which found that MMP-3 levels were strongly correlated with synovitis on magnetic resonance imaging (MRI) of the knee in RA8, there were no other studies that reported correlations between biomarker levels and scores for other surrogates that have been shown to have predictive validity for structural damage (Criterion 5). The group vote for the strength of recommendation in support of including each individual criterion under the category of truth in the draft criteria ranged from 5.8 for Criteria 1 and 5 to 7.7 for Criterion 4 (see Table 2).
Discrimination (Criteria 6–12)
Assay reproducibility data are largely based on the manufacturer’s studies (package inserts); kits with intra-assay coefficient of variation (CV) less than 10% and inter-assay CV less than 15% (Criterion 6) are commercially available for all the biomarkers. Several studies have clarified the influence of potential sources of variability on biomarker levels including: age, sex, menopause, circadian rhythms, body mass index, physical activity, nonsteroidal antiinflammatory drugs (NSAID), renal and hepatic disease, and contribution of different affected joints, although studies have primarily been cross-sectional (Criterion 7). Variation with age seems to occur especially for OPG, while sex appears to have a particular influence on MMP-3 levels9–12. Menopausal status influences levels of all the biomarkers, while diurnal variation is most pronounced for S-CTX-I13–15. Hepatic disease influences levels of OPG, RANKL, and MMP-311,16, while renal failure influences S-CTX-I and OPG levels17,18. Nevertheless, the literature is still somewhat contradictory and often neglects potential confounders. Metabolism of the biomarkers has not been studied in either normal individuals or in patients with RA (Criterion 8). Some cross-sectional studies have compared biomarker levels in RA patients with healthy, but not age and gender matched, controls (Criterion 9). One study revealed higher levels of RANKL and OPG in RA patients19, U-CTX I was slightly elevated in RA patients compared to healthy controls in 2 small studies6,20, and S-CTX-I was elevated in destructive, but not in non-destructive RA in another study21. Three studies showed increased U-CTX-II in RA patients compared to controls6,22,23, and 7 reported higher levels of MMP-3 in RA9,10,23–27
Several prospective cohort studies have examined the independent association between the baseline level of a biomarker and the structural damage endpoint (Criterion 10). Studies from the COBRA cohort concluded that the OPG/RANKL ratio, U-CTX-I, and U-CTX-II are independent predictors of radiographic progression23,28,29. In a recent study both baseline levels of U-CTX-II and the longitudinal values (area under the curve, AUC) independently predicted radiographic progression30. Evidence supporting MMP-3 as a predictor of radiographic progression is conflicting, as some studies examined only baseline levels, several did not address potential confounders through multivariate analysis, and sample sizes were typically small10,30–37. MMP-3 decreased after initiation of MTX in one study, but there was no association between this change and subsequent change in the structural damage endpoint30.
Only one randomized controlled trial examined the association between biomarker levels and radiographic progression (Criterion 11) and this showed that the change in U-CTX-II, but not U-CTX-I, was an independent predictor of subsequent radiographic progression38. Associations have not been specifically studied in pre-radiographic cohorts (Criterion 12), but subgroup analysis of pre-radiographic patients in the COBRA study showed that both U-CTX-I and U-CTX-II levels were strongly associated with radiographic progression23. The group vote for the strength of recommendation in support of including each individual criterion under the category of discrimination was generally high and ranged from 7 for Criterion 9 to 9.1 for Criterion 6 (Table 2). The only exception was the low score of 4.4 for the criterion that addressed the metabolism, clearance, and half-life of the biomarker (Criterion 8).
Feasibility (Criteria 13 and 14)
Evidence addressing the 2 criteria listed under the category of feasibility is based largely on unpublished data obtained from the manufacturers of the assays. There is no international standardization of the assays for any of the markers. According to the manufacturers, the assays are quite well characterized and methodologically simple (Criterion 13). There is limited documentation on stability of the biomarkers at room temperature and in frozen specimens (Criterion 14). Degradation after longterm storage seems to be a particular problem with RANKL39, while CTX-I and CTX-II remain stable after repeat freeze-thaw6,40,41, but documentation of the effect of longterm storage was not found. There is very limited documentation for this criterion as regards MMP-3 and OPG. The group vote for the strength of recommendation in support of including Criteria 13 and 14 was 7.2 and 7.1, respectively.
Second voting exercise
After consideration of the entire literature search addressing all 14 criteria (Table 3), the group rated the SOE in support of the biomarkers as reflecting structural damage outcomes in RA highest for U-CTX-II [6.5 (NRS 0–10)]. Key omissions identified in this second voting exercise were the desirability of demonstrating associations between changes in the biomarker and radiographic progression for all drug classes and in individual patients.
DISCUSSION
Our literature search and the succeeding voting exercises show that the evidence in support of any of the 5 tested biomarkers as reflecting structural damage endpoints in RA is insufficient to justify their substitution of radiographic changes, with the highest score being only 6.5 for U-CTX-II. Moreover, some criteria, particularly those categorized under truth (Criteria 1 and 5), were regarded as being of lesser importance for inclusion in the draft set. As noted in a companion report of the soluble biomarker workshop at OMERACT 9, in retrospect it was recognized that in the setting of validation of predictive biomarkers, the OMERACT filter criteria of truth and discrimination largely overlap42. The importance of demonstrating associations between biomarkers and structural damage in several drug classes as well as in individual patients was also newly raised.
As with the example of CRP3, the criteria itemized under the category of truth were poorly supported by the literature for all the tested biomarkers. The relation of the biomarkers to joint remodeling is well described for all the markers, but there are very few animal studies or studies comparing these biomarkers to other surrogates of structural damage. Voting exercise 2, however, showed that the group questioned both the relevance of animal studies in this context and the importance of studies proving that the marker is associated with a surrogate endpoint, even if it has been previously shown that the surrogate is associated with the damage endpoint. Even if animal models are easy to replicate and allow for evaluation of the biomarker’s performance throughout the course of the disease, including the influence of pharmacologic manipulations, in a spectrum of models with correlations to “gold standards” such as histopathology and imaging, animal models do not necessarily reflect the pathophysiological process evident in humans. Moreover, negative animal data do not exclude an association in humans and, therefore, might not be considered an essential requirement. All the biomarkers have been immunohistochemically localized to joint tissue, but the presence of the marker in the joint does not prove its relevance to the destructive process. Only CTX-II is specific for joint tissue (hyaline cartilage), as the other markers are also involved in other physiological and pathological processes in the body. It is conversely questionable if a marker should be excluded if it is not totally specific for joint tissue, as in the example of the CRP.
Under the subheading of discrimination, a major concern is the variability in the biomarker level due to sex, age, menopausal status, and time of day, not to mention the possible variation due to meals, which has been poorly studied. The studies focusing on this criterion were many, but insufficient for all the markers and sometimes flawed by inadequate study design (e.g., few patients, no adjustments for confounders, cross-sectional design). There were only a few studies comparing biomarker levels in RA patients and controls, and in most studies the controls were not matched for age and gender. The metabolism and half-life were not described for any of the biomarkers (Criterion 8). The group, however, did not consider this criterion as particularly desirable in the validation process. Criteria requiring demonstration of an independent association between biomarker and radiographic damage in clinical studies were rated highest by the group for retention in the draft set. Although the strongest independent association with damage was observed for U-CTX-II, associations were only moderate. Frequently noted limitations in study design were: low sample size, analysis of biomarker limited to baseline samples, failure to address known confounders in multivariate analyses, and use of different radiographic endpoints.
Some of the lowest scores for SOE were found under the category of feasibility. In clinical trials, analyses are typically done simultaneously on samples frozen for different lengths of time. This might lead to alterations in measured serum protein concentrations, as indicated for RANKL41. Information on stability of the biomarker in frozen specimen and the effect of repeated freeze-thaw is not readily available from assay manufacturers and is rarely indicated in the inserts that come with the assay kits. International standardization or reference values for commercially available assays are not available. This makes comparisons of levels across studies and analyses at different laboratories difficult.
This study has some limitations. The literature search was performed by one single reviewer, but the same search strategy with some additions as in the previous testing of the criteria with CRP was used3. The heterogeneous and limited study selection identified by the literature search allowed only descriptive data synthesis. Study quality was not formally assessed by the reviewer, although group members were provided with a study description that included the principal features of the study design. A survey was sent to the kit manufacturers to obtain unpublished data, but we cannot exclude publication bias.
This literature search, and the subsequent voting exercises, has guided the further discussions and development of the validation criteria set within the working group at OMERACT 9. In conclusion, more documentation is needed for any of the 5 tested biomarkers to be regarded as reflecting structural damage outcomes in clinical trials. In particular, there is a paucity of data under the categories of truth and feasibility. This testing exercise of the OMERACT 8 preliminary validation criteria for soluble biomarkers using RANKL, OPG, CTX-I, CTX-II, and MMP-3 revealed that some of the criteria might not be essential in the validation process, and some omissions to the set were highlighted.
Appendix
Appendix
Footnotes
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.
- 25.
- 26.
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.
- 99.
- 100.
- 101.
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.
- 110.
- 111.
- 112.
- 113.
- 114.
- 115.