Introduction

Randomised controlled trials (RCTs) represent the criterion standard in evaluating healthcare interventions. However, RCTs can yield biased results if they lack methodological rigour [1], especially where surgical techniques are involved [2]. Readers need complete, clear and transparent information in order to assess a trial accurately. Unfortunately, many trials fail to provide critical information in published reports [35]. Inadequately reported RCTs are associated with bias in estimating the effectiveness of interventions and with poor methodology [612].

The Consolidated Standards of Reporting Trials statement was originally developed in 1996 [13] to aid reporting of RCTs, and an extension for non-pharmacological interventions (CONSORT NPT) was published in 2008 [14]. It consists of a 23-item checklist and flow diagram.

Our team have previously assessed the methodological quality of recent RCTs in plastic surgery, concluding that it requires improvement [15]. However, although flaws in methodology limit the validity and generalizability of study results, without accurate reporting, shortcomings in study design and implementation can be compounded and become difficult to accurately assess. Studies looking at methodological quality are usually separate from reporting quality [16]. Given this previous work, the purpose of this new study is to systematically review the reporting quality of recent RCTs in plastic surgery using CONSORT NPT criteria. This new work will allow the plastic surgical community to take stock of where reporting quality has been in recent years.

Material and methods

Search methods

The search technique was the same as used for this team’s previous work assessing the methodological quality of plastic surgery RCTs [15]. An information specialist (trained in database searching by NHS Evidence, who conducts approximately 100 searches per year and is the staff trainer) based at the lead authors Plastic Surgery Unit searched MEDLINE® and the Cochrane Database of Systematic Reviews from 1 January 2009 to 30 June 2011 for the Medical Subject Headings (MESH is the NLM controlled vocabulary thesaurus used for indexing articles for PubMed) ‘Surgery, Plastic’ or ‘Reconstructive Surgical Procedures’ with ‘or’ used as a Boolean operator and with the ‘explode’ function activated. We chose this time period since the CONSORT NPT statement was only published in 2008 and hence it would be unfair to hold RCTs prior to this period to a standard that did not exist at the time of writing. Limitations were set for the English language, human studies and randomised controlled trials.

Results were then manually searched by four of us (ED, MS, CFC and EE) for relevant RCTs involving surgical techniques. Papers involving purely pharmacological therapies in all arms, cost analyses, study protocols, interim or non-randomised studies, short communications and RCTs involving virtual or simulated procedures were excluded.

Scoring

The papers were then scored by one of three primary scorers (ED, MS, EE) against the 23-item CONSORT NPT checklist with each item being given an equal weighting. Items 4 and 11 were subdivided into four and three sub-parts, respectively; a full point was only gained if all sub-parts were fulfilled; otherwise, the appropriate fraction was awarded. The resulting mark out of 23 was termed the ‘CONSORT score’. Following this initial round of scoring (ED scored 2009, MS scored 2010 and 2011), all papers were then re-scored by a single secondary scorer (CFC). Discrepancies were then resolved by consensus (between ED, MS and CFC), and if that could not be reached, they were referred to the lead author (RA) for a final judgement. Evaluators were not blinded to the country of origin of the authors.

Compliance with individual items of the statement was analysed (by summating the number of articles fulfilling that item divided by the total number of included articles) as well as the relationship between CONSORT score and year of publication, geographical origin (for the RCT), the number authors and the ISI 2010 impact factor for the journals in which the RCTs were published. These additional correlates were chosen to give information about whether improvements are occurring over time and whether volume and perceived markers of RCT quality (like impact factor of the journal in which it is published) correlate with actual reporting quality as defined by CONSORT.

Secondary analyses

In addition to the CONSORT score, papers were assessed for whether they fulfilled seven additional criteria referred to in the CONSORT NPT and whether they mention conflicts of interest, sources of finding, a trial registry number and ethical approval.

Statistical analysis

In line with previous work [15], inter-rater agreement was assessed using the Kappa score. Data were analysed using non-parametric descriptors such as median, and correlations were calculated using Spearman rho, using SPSS version 20.

Results

The search history was as follows:

  1. 1.

    MEDLINE; exp SURGERY, PLASTIC/; 25,698 results

  2. 2.

    MEDLINE; exp RECONSTRUCTIVE SURGICAL PROCEDURES/; 52,999 results

  3. 3.

    MEDLINE; 1 OR 2; 76,741 results

  4. 4.

    MEDLINE; 3 [Limit to: Publication Year 2009–2011 and English Language]; 11,395 results

  5. 5.

    MEDLINE; 4 [Limit to: (Publication Types Randomized Controlled Trial) and Publication Year 2009–2011 and English Language]; 254 results

From the initial set of 254 papers retrieved from MEDLINE, 63 were selected following a manual search and abstract assessment by the authors (ED, MS, CFC). Subsequent to complete download of all 63 papers, six were excluded for being a study protocol, purely pharmacological or theoretical, retrospective or an interim study. This resulted in 57 RCTs which met the inclusion criteria (seven were multicentre), published across 28 journals. All RCTs compared treatment interventions and none related to diagnosis. No further relevant trials were found in the Cochrane Database of Systematic Reviews (Fig. 1).

Fig. 1
figure 1

PRISMA flow diagram, illustrating how papers were selected (adapted from Moher et al. [17])

The median CONSORT score was 11.5 out of 23 items (range 4.5–21.0) with a Kappa score 0.80. There was a slight trend for improvement over the 3-year period, on average 1.5 CONSORT points per year (Spearman rho 0.999) (see Table 1).

Table 1 Summary statistics for CONSORT scores per year

Compliance with individual items of the CONSORT

Compliance was highly variable for the different CONSORT items. This is shown in Fig. 2. Compliance was the poorest for items related to intervention/comparator details (7 %), randomisation implementation (11 %) and blinding (26 %), as shown in Table 2.

Fig. 2
figure 2

Compliance of the 57 RCTs with the individual items of the CONSORT statement

Table 2 Compliance of RCTs with individual items of the CONSORT statement (ranked in order of increasing fulfilment)

Compliance with additional criteria

There was poor fulfilment of additional criteria and only 61 % declared conflicts of interest, 75 % permission from an ethics review committee, 47 % declared sources of funding and just 16 % stated a trial registry number (Table 3).

Table 3 Compliance of RCTs with additional criteria

CONSORT score and number of authors

There was no correlation between number of authors and CONSORT score (Spearman rho = 0.12, see Fig. 3).

Fig. 3
figure 3

CONSORT score against number of authors

CONSORT score and impact factor

There was no correlation between journal impact factor and CONSORT score (Spearman rho = 0.26, see Fig. 4).

Fig. 4
figure 4

CONSORT score against ISI 2010 impact factor

Geographical distribution of RCTs and CONSORT compliance

There was no correlation between the volume of RCTs conducted in a particular country and CONSORT score (Fig. 5).

Fig. 5
figure 5

A bar chart of CONSORT score and number of RCTs against country

Discussion

Over the last decade, there have been increasing calls for utilising an evidence-based medicine approach within surgery [18, 19]. However, the evidence base in plastic surgery is still dominated by case series, and calls for higher levels of evidence (including RCTs) are gathering pace [20, 21]. Poor reporting ‘short circuits’ proper critical appraisal prevents inclusion in systematic reviews and meta-analyses, and resulting clinical judgements could be misleading and potentially dangerous.

Previous reviews of surgical RCTs have shown that, in 20 %, the conclusions were not justified by the data [22]. Solomon et al. found that the lowest quality RCTs were those that involved a surgical technique, were published in a surgical journal and where a surgeon was the principal author [23].

Research in the field of RCT reporting quality has pointed to the consistent absence of the same key quality data: sample size calculations, randomisation sequence generation method and implementation, post-randomisation verification of balance in known confounders, allocation concealment, blinding, intention-to-treat analysis and participant flow charts. Our study is the first to assess the compliance of recent RCTs in plastic surgery with the CONSORT NPT criteria, and the results support this pattern of poor compliance with a low median CONSORT score of 11.5. This is similar to the CONSORT scores (out of 22) found in earlier work by one of us (RAA), with 11.1 for Urological RCTs, 10.3 for cardiac surgery, 10.9 for general surgery, 11.9 for hepatobiliary, 10.8 for orthopaedic and 12.0 for vascular surgery [19]. Furthermore, the additional CONSORT NPT criteria were severely lacking (Table 3). Whilst they are not core criteria, the authors recognise them as important for RCTs of interventions. Poor reporting has also been linked with poor methodology as shown by Taghinia et al. [24] and others [812].

Studies with more authors have been correlated with higher citation levels [25]. One would anticipate that more authors on a study would lead to better reporting quality; however, our research did not support this, consistent with earlier work [19]. There was also no correlation between CONSORT score and the impact factor of the journal in which the RCT was published, again consistent with earlier work by one of us (RA) [19]. The impact factor is heralded by many plastic surgeons and editors [26] as the reflection of increasing journal and article quality. Yet successive studies are now showing that it has no bearing on the quality of reporting of one of the highest levels of evidence—the RCT. Sinha et al. [27] identified the top three ranked surgical journals by impact factor and found that, of 42 RCTs analysed, only 40 % had a Jadad score ≥3 and there was no significant difference between CONSORT-endorsing and non-endorsing journals. Our data showed no link between volume of RCTs from a particular geographic region and CONSORT score. This suggests that the problems of poor reporting are indeed global and not confined to a few select countries.

Within plastic surgery specifically, several studies have found that RCTs need improvement in reporting quality. McCarthy et al. [28] analysed level 1 studies in five plastic surgery journals from 1978 to 2009. They found that only 39 % reported randomisation technique. In their analysis of 96 RCTs in plastic surgery published between 2004 and 2008, Veiga et al. [29] found that 29 % appropriately described allocation concealment. Momeni et al. [30] analysed 172 RCTs from three plastic surgical journals during 1990–2005 and found that only 12 % reported on their allocation concealment and 37 % described participant dropouts. Karri looked at 133 RCTs published across three plastic surgery journals from 1980 to 2004 [31]. Sample size calculation was only reported in 12.8 % of trials, randomisation methodology in 29.3 %, allocation concealment in 18.8 %, blinding of investigator/assessment in 51.9 % and study limitations in only 33.8 % trials. Veiga Filho et al. in 2005 examined 34 RCTs in plastic surgery and concluded that they were overall of low quality with 59 % scoring two points or less on a Jadad score [32]. In addition, they concluded that the process of randomisation in the studies was appropriately designed and conducted but that authors did not report it. Follow-up work by the same group in 2011 concluded that RCT quality as measured by Jadad criteria had ‘significantly increased’ [29].

In 2005, the International Committee of Medical Journal Editors implemented a policy [33] that required registration of all clinical trials prior to enrolment of the first patient (part of the aim being a reduction in publication bias from the non-publication of negative studies). In 2007, this became part of US regulatory requirement with section 801 of the Food and Drug Administration Amendments Act [34]. In 2008, trial registration became an ethical requirement with the updated Declaration of Helsinki mandating it [35]. Despite this, only 16 % of RCTs in our sample stated their trial registry. The low reporting of conflicts of interest, sources of funding and ethical approval is concerning; such information is pivotal scientific and scholarly transparency and integrity.

On the positive side, there was an upward trend in the median CONSORT score, rising three points over the 3-year period 2009–2011. However, since all the CONSORT items are mandatory for reporting, this increase is unlikely to be due to a serious appreciation and compliance with CONSORT specifically. It is more likely to be due to general improvements in reporting that may be occurring or simply random.

The limitations of our study include restriction to the English language, searching only MEDLINE® and analysing papers by publication date rather than submission date, since publication lag times for journals vary. The English language has increasingly become the lingua franca of science with an estimated 80–90 % of papers in scientific journals written in English [36]. This has been coupled with initiatives to offer translation services of key papers [37]. We feel that this strategy would still capture all the relevant papers. Whilst we did not blind our evaluators to the country of origin of the authors, we do not feel there was any inherent bias against a particular country and scores were recorded in a dispassionate manner.

Further work is needed to assess barriers to compliance with CONSORT amongst key stakeholders in the process: authors, journal reviewers and editors, funders, institutions and readers. Meticulous planning, involvement of a trial methodologist, biostatistician and compliance with CONSORT NPT in the eventual paper would be key for those conducting and reporting RCTs. Journal editors and peer reviewers as guardians of the scholarly literature have an important gatekeeper role here [38, 39].

We support previous calls for the better education of plastic surgeons at all levels in clinical research methods, evidence-based medicine and improved funding/support of plastic surgical RCTs. McCulloch et al. [40] have called for the deployment of alternative prospective designs, such as interrupted time series studies, to be used when randomised trials are not feasible—we support their call and indeed suggest that registry studies may be useful in some instances.

One solution is to hard-wire compliance with CONSORT by making the checklist a mandatory item for submission if a manuscript is submitted as an RCT. It can also be published as a supplementary item online allowing for greater transparency and scrutiny by readers. Such a policy was adopted by the International Journal of Surgery in January 2013 [41].

Conclusion

The reporting quality of RCTs in plastic surgery requires improvement. Perceived surrogate markers of quality such as number of authors and impact factor of the journal had no relationship on CONSORT compliance. We suggest ways this could be improved including better education, awareness amongst all stakeholders and hard-wiring compliance through electronic journal submission systems.