Abstract
Objective. Previously published literature assessing the reporting of outcome measures used in joint replacement randomized controlled trials (RCT) has revealed disappointing results. It remains unknown whether international initiatives have led to any improvement in the quality of reporting and/or a reduction in the heterogeneity of outcome measures used. Our objective was to systematically assess and compare primary outcome measures and the risk of bias in joint replacement RCT published in 2008 and 2013.
Methods. We searched MEDLINE, EMBASE, and CENTRAL for RCT investigating adult patients undergoing joint replacement surgery. Two authors independently identified eligible trials, extracted data, and assessed risk of bias using the Cochrane tool.
Results. Seventy RCT (30 in 2008, 40 in 2013) met the eligibility criteria. There was no significant difference in the number of trials judged to be at low overall risk of bias (n = 6, 20%) in 2008 compared with 2013 [6 (15%); chi-square = 0.302, p = 0.75]. Significantly more trials published in 2008 did not specify a primary outcome measure (n = 25, 83%) compared with 18 trials (45%) in 2013 (chi-square = 10.6316, p = 0.001). When specified, there was significant heterogeneity in the measures used to assess primary outcomes.
Conclusion. While less than a quarter of trials published in both 2008 and 2013 were judged to be at low overall risk of bias, significantly more trials published in 2013 specified a primary outcome. Although this might represent a temporal trend toward improvement, the overall frequency of primary outcome reporting and the wide heterogeneity in primary outcomes reported remain suboptimal.
With an expanding and aging population, an escalating prevalence of obesity, and a rising need for both initial and joint revision surgery, the incidence and associated economic burden of joint replacement surgery has been projected to increase exponentially1. For many patients, joint replacement surgery is an effective management option to reduce pain, restore function, and improve quality of life. However, individuals who undergo joint replacement are also at risk for a variety of adverse events associated with both the anesthetic and the surgery. With expanding indications for joint replacement and the continuing evolution of surgical techniques and implants, many important research questions need to be answered. To address these issues there is an ongoing need for high-quality trials within this field of orthopedics.
Randomized controlled trials (RCT) are widely acknowledged to be the best type of trial design to evaluate the effectiveness and safety of healthcare interventions2,3,5. However, the RCT’s ability to answer important clinical questions will always be limited by its design and the outcome measures used. To draw meaningful conclusions from individual RCT, relevant, robust, and validated outcome measures are required. In addition, these outcomes should be prespecified and clearly reported as either primary or secondary. This enables readers to assess whether the RCT is adequately powered and avoids the perception of selective reporting bias.
In the field of joint replacement surgery, previously published literature assessing the reporting frequency, relevance, and homogeneity of outcome measures used has revealed disappointing results6. Specifically, primary outcomes were often not specified, and when they were, there was significant heterogeneity in the types of outcome measures used to assess the same endpoint. To address this on a large scale, several multinational collaborations and initiatives have been established. For example, following the poor findings in their systematic review, Riddle, et al proposed that consensus from an international group of experts involved in the care of these patients was needed7. In 2008, a working group within the Outcome Measures in Rheumatology (OMERACT) and the Osteoarthritis Research Society International was established with the aim of improving the reporting of relevant, evidence-based health outcome domains within joint replacement trials8. In addition, in 2008 the Enhancing the Quality and Transparency of Health Research Network was launched and in 2010 the Consolidated Standards of Reporting Trials (CONSORT) Statement (first published in 1996) was published to provide researchers with a checklist of 25 items to ensure accurate, complete, and transparent reporting of trial findings9.
It remains unknown whether these international initiatives have led to any improvement in the quality of joint replacement trial reporting and/or a reduction in the heterogeneity of outcome measures used. To investigate this question and inform the OMERACT 2014 Working Group meeting (which aimed to define an internationally agreed-upon core set of domains and outcome measures that should be reported in every joint replacement clinical trial)10, we performed a systematic review of outcomes that had been reported in joint replacement trials published in 2008 and 2013. This paper reports the risk of bias of included trials and assesses and compares their primary outcomes. A separate paper will report the extent to which all reported outcomes met the OMERACT criteria of truth, discrimination, and feasibility, and map the reported outcomes to the OMERACT Filter 2.011.
MATERIALS AND METHODS
Search strategy and criteria
This systematic review was performed in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) format12 and the protocol was prospectively registered with the International Prospective Register of Systematic Reviews (PROSPERO) Registration number: CRD42014009216. This study did not require ethics committee approval because it does not involve any humans or animals and is a systematic review of published articles in the medical literature.
All randomized or quasi-randomized (where allocation not strictly random) controlled trials investigating adult patients undergoing joint replacement surgery (defined as substitution of any joint surface with a prosthesis) were identified. Trials were included if the comparator was another type of joint implant, surgical placebo or sham, usual care, physical therapy, or other active treatment and at least 1 outcome had been reported. Studies were excluded if they evaluated spinal joint replacement surgery, had a primary intervention of interest that was not the insertion of a joint prosthesis (e.g., trials investigating preoperative education, perioperative analgesia, or postoperative care) or were not published as a full report in English.
An electronic literature search for articles published in 2008 and 2013 was performed in MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials (CENTRAL) using a comprehensive search strategy (Table 1). We also performed a hand search of reference lists of relevant articles to identify additional relevant trials.
Two authors (BLR and PDW) independently screened the titles and abstracts of all studies identified by the search strategy and then independently reviewed the full text of all potentially eligible studies to find studies that fulfilled the inclusion criteria. Any disagreement in study selection was resolved by consensus or by discussion with a third reviewer (RB).
The same 2 authors independently extracted data from the included studies using predetermined forms. Differences in data extraction were resolved by referring back to the original articles and establishing a consensus. A third reviewer (RB) was consulted to help resolve differences as necessary. The information extracted included study site, funding, enrollment date, size, design, population, interventions, and outcome measures (and whether they were prespecified). Each outcome was recorded as either primary or secondary. An outcome was recorded as primary if it was reported as a “primary outcome” in the manuscript or registered protocol or was used to calculate the sample size. More than 1 primary outcome could be recorded provided that these criteria were met.
Two authors (BLR and PDW) independently assessed risk of bias for all included studies using methods recommended by the Cochrane Collaboration13, which assess the following key criteria: random sequence generation, allocation concealment, blinding of participants, care provider and outcome assessor for each outcome measure, incomplete outcome data, selective outcome reporting, and other sources of bias. Other sources of bias that were considered included whether cointerventions and adherence to treatment (e.g., for analgesics and physical therapy programs) were assessed and reported to be equal between groups, and sources of funding. Each criterion was rated as low risk of bias, high risk of bias, or unclear risk (indicating either lack of information or uncertainty over the potential for bias). Information to inform the risk of bias rating was sourced from searching trial registries and the published papers.
An assessment of overall risk of bias was made in which low overall risk of bias indicated plausible bias unlikely to seriously alter the results (low risk of bias for all key domains), unclear overall risk of bias indicated plausible bias that raises some doubt about the results (unclear risk of bias for 1 or more key domains), and high overall risk of bias indicated plausible bias that seriously weakens confidence in the results (high risk of bias for 1 or more key domains)13. A third reviewer (RB) was consulted to resolve differences as necessary.
Data analysis
A descriptive analysis of the primary outcomes was performed that compared frequency and description of primary outcome measures by site of joint replacement and year. Comparison of 2008 and 2013 results for overall risk of bias and number of trials reporting primary outcomes was made using the Pearson chi-square tests. We also used the Pearson chi-square test to determine whether an association might exist between overall risk of bias and reporting of primary outcomes.
RESULTS
The search strategy identified 1635 studies. Of these, 70 RCT (30 with 2789 participants published in 2008 and 40 with 4253 participants published in 2013) met the eligibility criteria and were included in the review. A PRISMA flow diagram of the searches through to the final inclusion is shown in Figure 1. The summary characteristics of the included trials according to their year of publication are shown in Appendix 1.
Study characteristics of trials published in 2008
Eighteen (60%) of the 30 trials published in 2008 were conducted in Europe14–23,24,25,26,39,40,41,42,43, 6 (20%) in the United States27,28,29,30,31,32, 2 (7%) each in Asia33,34 and Canada35,36, and 1 (3%) each in Australia37 and New Zealand38. Twenty trials (67%) were independently funded14,15,16,17,19,20,22,23,24,26,27,28,30,31,32,34,35,37,38,39, 9 (30%) had industry funding17,18,21,29,31,36,40,41,42, and for 1 trial, the source of funding was unclear25. The year of first recruitment ranged from 1994 to 2007 (median 2001). The most common joint evaluated was the knee (n = 19, 63%17,18,19,20,21,22,23,24,26,29,30,33,34,35,36,37,39,42,43), followed by the hip (n = 10, 33%14,15,16,27,28,31,32,38,40,41), and 1 trial (3%) evaluated the shoulder25. There were no trials evaluating joint replacement surgery of the elbow, wrist, hand, ankle, or foot. Fifteen (50%) trials evaluated 2 or more different prostheses or components 14,16,17,18,19,25,26,30,31,38–43, 13 (43%) evaluated the same prosthesis but used differing surgical techniques20,21,22,23,24,27,28,29,33,34,35,36,37, and 2 (1%) evaluated joint replacement versus other joint surgeries15,32.
Study characteristics of trials published in 2013
Twenty (50%) of the 40 trials published in 2013 were conducted in Europe44–53,54–63, 9 (23%) in Asia64,65,66,67,68,69,70,71,71a, 5 (13%) in the United States72,73,74,75,76, 4 (10%) in Canada77,78,79,80, and 1 (3%) each in Australia81 and the Middle East82. Twelve trials (30%) were industry funded45,52,54,55,56,59,62,63,70,73,77,80, 15 (38%) had independent funding44,46,48,50,51,53,57,60,64,65,69,72,78,81,82, and in 13 trials (33%) the source of funding was not specified47,49,58,61,66–68,70,71,74–76,79. The year of first recruitment ranged from 1996 to 2011 (median 2007). The knee (n = 20, 50%)45,48,49,54,55,57,61,64,65,66,68,69,70,71,71a,73,74,75,76,81 and hip (n = 17, 43%)44,46,47,51,52,53,56,58–60,62,63,67,72,77,79,80 were again the most commonly studied joints, with 2 trials (5%) studying the shoulder78,82 and 1 (3%) studying the wrist50. There were no trials evaluating joint replacement surgery of the elbow, hand, ankle, or foot. Eighteen trials (43%) evaluated 2 or more different prostheses or components44,45,47,48,49,50,52,54,56,57,61,62,63,64,67,73,79,80, 20 (50%) evaluated the same prosthesis but used differing surgical techniques53,55,58,59,60,65,66,68,69,70,71,71a,72,74,75,76,77,78,81,82, and 2 (5%) evaluated joint replacement versus other joint surgeries46,51.
Risk of bias
Of the 30 trials published in 2008, 6 (20%) were judged to be at low overall risk of bias, and the remainder were all judged to be at high or unclear overall risk of bias (Figure 2). Four (21%) of the 19 knee trials36,39,42,43 and 2 (20%) of the 10 hip trials40,41 were judged to be at low risk of bias. In the 1 shoulder joint trial25, the risk of bias was deemed unclear.
Of the 40 trials published in 2013, only 6 (15%) were judged to be at low overall risk of bias (Figure 3). Three (15%) of the 20 knee trials were judged to be at low overall risk of bias56,58,77, 11 (55%) were deemed unclear48,57,64,66,68,69,70,71a,73,76,81, and 6 (30%) were judged to be at high risk of bias49,54,61,71,74,75. Three (21%) of the 17 hip trials were judged to be at low overall risk of bias56,58,77, 6 (43%) were deemed unclear46,47,59,60,62,80, and 8 (57%) were judged to be at high risk of bias44,51,52,53,63,67,72,79. One shoulder trial was deemed unclear82 and 1 was judged to be at high overall risk of bias78, while the single wrist trial was also judged to be at high overall risk of bias50.
Figure 4 compares the number of trials with low risk of bias for each risk of bias domain and overall low risk of bias according to publication year. There was no difference between years in number of trials judged to be at overall low risk of bias (6/30 in 2008 compared with 6/40 in 2013, chi-square = 0.302, p = 0.75). The method used to generate the random sequence was adequately reported in 63% of the 2008 trials and 53% of the 2013 trials; however, details of allocation concealment were reported in only 10 trials (33%) in 2008 and 12 (30%) in 2013. More trials reported blinding of patients (n = 13, 33%) and outcomes (n = 14, 35%) in 2013 in comparison with 2008 [n = 17 (23%) and n = 4 (13%), respectively]. Twenty-six trials (87%) in 2008 reported detailed baseline characteristics; however, this applied to only 22 (55%) of the 2013 trials. Few trials prespecified or reported the use of relevant cointerventions [n = 13 (43%) in 2008, n = 9 (22%) in 2013], or described how incomplete data were addressed [n = 4 (13%) in 2008, n = 14 (35%) in 2013]. Several trials also had evidence of selective outcome reporting [n = 11 (37%) in 2008, n = 9 (22%) in 2013]. These issues may have influenced outcomes.
Primary outcomes
A summary of the primary outcomes reported in the 2008 and 2013 trials is shown in Table 2. Compared with trials published in 2008, more trials published in 2013 reported a primary outcome [n = 22/40 (55%) compared with n = 5/30 (17%), chi-square = 10.6316, p = 0.001]. Four out of 19 knee trials (21%) specified a primary outcome in 2008 compared with 11/20 knee trials (55%) in 2013. In these 15 knee trials, 17 different primary outcomes were specified despite evaluating similar questions. Within each time period, only 2 trials reported the same primary outcome (knee range of motion was specified in 2 trials in 2008 and difference in mechanical axis deviation measured in degrees in the coronal plane on radiographs in 2 trials in 2013). No primary outcome was used in both time periods. The majority of primary outcomes reported evaluated technical aspects of the procedures (n = 12/17, 71%) rather than patient-centered outcomes.
Only 1 out of 10 (10%) hip trials reported a primary outcome in 2008 compared with 8/17 (47%) hip trials in 2013. Similar to the knee trials, the primary outcomes varied widely and were focused on technical outcomes of the procedure. Primary outcomes used in more than 1 hip trial included component migration [radiostereometric analysis (RSA), n = 3], deviation ≥ 5° of planned stem shaft angle (radiographs; n = 2), computerized gait assessment (mean gait velocity, stride length; n = 2), and revision rates (n = 2). Assessment of component migration was the only primary outcome measure used in both time periods.
Two of the 3 shoulder trials reported a primary outcome. Only the two 2013 trials evaluating shoulder joint replacement surgery reported a primary outcome, and each trial used a different measure to assess “improvement” (postoperative pain on a visual analog scale 0–10 mm and healing rate of the subscapularis tendon visualized on magnetic resonance imaging). The single wrist trial reported a primary outcome and used RSA to measure component migration in mm.
DISCUSSION
We observed a significant difference in the frequency of reporting of primary outcomes in joint replacement trials in 2013 compared with 2008. Only 17% (21% knee and 10% hip) of RCT published in 2008 reported a primary outcome measure in comparison with 55% (55% knee and 47% hip) in 2013. Without knowledge of the frequency of reporting before 2008 and between 2008 and 2013, it is not possible to know with certainty whether the improved reporting in 2013 reflects a real improvement over time. Nevertheless, almost half of all joint replacement trials continue to fail to specify a primary outcome despite widely accepted CONSORT recommendations9,83. Similar inadequate reporting of primary outcomes has been shown in other surgical fields including ophthalmic surgery84, solid organ transplantation85, plastic surgery86, urology87, trauma surgery88, and neurosurgery89.
In addition, we found that among trials that did specify 1 or more primary outcomes, these varied widely despite the trials addressing similar research questions. In both years (2008 and 2013), no primary outcome measure was used in more than 2 trials despite similar research questions. This heterogeneity in primary outcome reporting is consistent with results from a previous systematic review6. Heterogeneity in outcome measurement hampers our ability to combine, contrast, and accurately interpret the results from multiple RCT answering the same (and sometimes similar) research questions. To improve the quality of information available for patients undergoing joint replacement surgery, RCT evaluating the same clinical questions need to use a homogeneous set of outcome measures. Further efforts are required to achieve this10.
Further, the majority of primary outcomes reported were predominantly focused on technical aspects of the surgery. Hence, despite the significant investment of time, money, and resources in evaluating these important research questions, we found that the majority of trials in our systematic review were not designed or powered to evaluate other important core domains of health for both the patient and society.
Few trials published in both 2008 and 2013 were judged to be at low overall risk of bias (20% in 2008 and 15% in 2013). Not surprisingly, the trials at less potential for bias were more likely to report a primary outcome measure. In addition, almost a quarter of the trials we included (20/70, 23%) were judged to be at unclear risk of bias because they reported insufficient information. While it is often not possible in surgical RCT to blind the investigators to the group assignments or standardize surgical techniques, it should be possible to minimize other potential sources of bias. Common areas of potential bias occurring in more than 50% of the studies included failing to describe allocation concealment, participant blinding, how incomplete data were addressed, and selective reporting of outcomes. Lack of adequate reporting of details of randomization, allocation concealment, blinding, cointervention use, and outcomes is not limited to joint replacement trials or orthopedic surgery84,87,88,89,90,91,92,93.
Our study had several limitations. First, while we used a comprehensive systematic search strategy to identify all relevant studies, we excluded foreign language publications. Given the high proportion of papers published in English-language journals (80%–90%), this is unlikely to affect generalizability94. Second, the majority of joint replacement trials in our review involved hip and knee surgery. There were limited trials evaluating the shoulder, wrist, and hand, and no trials evaluating elbow or ankle joint replacement. Therefore, our results may or may not be generalizable to joint replacement trials of other joints. Third, in selecting 2 publication years, there is a possibility that this literature may not have been truly representative of periods just before, between, and after these dates. The 5-year gap between study years may not have been long enough to identify meaningful change; however, we chose the start year as the year of publication of the study that showed poor quality of arthroplasty trials6. More studies may be needed in the future with a longer interval to look for improvements using the same quality criteria. Our results are, however, consistent with previous reviews. Finally, we judged risk of bias and specification of primary outcomes on the basis of the published paper. It may be that we overestimated potential for bias and underestimated frequency of primary outcome specification because of poor reporting practices rather than suboptimal trial methodology. However, we tried to limit this effect by also searching the trial registries for protocols.
Despite an observed increase in frequency of reporting of primary outcome measures in joint replacement trials in 2013 compared with 2008, almost 50% of trials published in 2013 did not report their primary outcomes. In addition, among trials that did report primary outcomes, these were heterogeneous, frequently measured technical aspects of surgery rather than patient important endpoints, and few trials used the same primary outcome even for similar research questions. In addition, the majority of trials published in both years were at high or unclear overall risk of bias and reflect a lack of implementation of quality improvement initiatives such as the CONSORT guidelines (or similar). Further efforts are needed to improve the quality of joint replacement trials and ensure primary outcomes are reported. A standardized, universally accepted core set of outcomes to be used in all joint replacement trials based upon their clinical relevance would enhance this field.
Acknowledgment
Co-author Dr. Andrew P. Sprowson died tragically on March 13, 2015. The authors remember him as an orthopedic surgeon with immense enthusiasm for research and for robust clinical evidence in the field of joint replacement surgery. He was an integral part of the Working Group for Joint Arthroplasty within the Outcome Measures in Rheumatology, and a great friend.
APPENDIX 1.
- Accepted for publication April 12, 2017.
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 71a.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.↵
- 91.↵
- 92.↵
- 93.↵
- 94.↵