![]() |
![]() |
SPIN THE RESULTS
Another issue concerns the standards for acceptable efficacy. RCTs frequently count events in the treated and control groups. From this can be calculated the absolute benefit or harm. If 3% in the treated group and 4% in the control group experienced an endpoint event (such as a gastric bleed, stroke, heart attack etc. depending on what was being studied), then the absolute risk reduction was 1% and the number need to treat (NNT) over the period of the trial to prevent one event was 100, and 99 were treated with no benefit and in some cases at considerable cost and perhaps harm. But the relative risk reduction was one percentage point starting from 4% and going to 3%, or 25%, a much more impressive result! Studies also compute odds ratios or hazard ratios which also yield relative risk reduction results. An odds ratio of 0.75 represents a risk reduction of 25%. But these odds or hazard ratios can be subjected to extensive statistical manipulation to correct for confounding and even more sophisticated analysis can be done to correct for other aspects of the trial. For these measures of trial results, focus shifts to the so-called confidence interval (CI) as a measure of statistical significance. The universally used CI gives the range for the ratio that has a 95% chance of not occurring by chance, and for the result in question to be statistically significant, the CI must not contain 1.00, the result if there is no effect. However, one sees studies, which call attention to what is viewed as an important result when the odds ratio is 1.01, a 1% relative risk, and presented as significant because the confidence interval is 1.0005 to 1.015. This is charitably called a small or modest size effect, but in fact is probably meaningless and merely reflects low standards of both the journal involved and the referees used. Conservative clinicians like very large so-called size effects, e.g. odds ratios of 0.2 or smaller or 2.0 or even greater before they show much interest. This reflects concern that for ratios near 1.00, the probability of unrecognized confounding is high, even if the result is statistically significant. Furthermore, statistical sigsignificance does not automatically mean clinical significance, a point that seems to have been forgotten by some journal editors, referees and the media. Trial results look best when presented in relative terms and this is the almost universal practice. A 40% risk reduction is much more impressive than needing to treat 100 patients to produce one beneficial result. The patient is impressed with 40%, has no way of knowing what is really going on and cannot calculate the absolute change from the relative change without more data. Relative benefits are emphasized in most guidelines. The NNT may be downplayed or never mentioned to the patient or even unknown to the physician. But these are population studies, and the NNT represents how many in the population studied are needed to be treated to prevent one adverse event or result in a beneficial event. The question then of course arises, what is an acceptable number? There is no consensus. It is a judgement call and in fact arbitrary, especially when the harms are poorly identified, if at all. How much important should be assigned to probabilities derived from large trial populations when the issue is whether or not to treat an individual patient. Population based studies in fact involve a wide range of patient characteristics and in some cases a rigged population.1 Critics claim that harmful events are downplayed, studies are too short to reveal them, may be rigged to produce low numbers of such events, post-approval reporting after marketing is underway is negligible, and thus the risk/benefit analysis is complex or frequently impossible. Many published studies make it difficult if not impossible even to derive NNT or NNH from the tabulated data. RCTs involve groups of participants that may not be representative of the populations needing treatment for a disorder. After all, in some cases participants are recruited by paying physicians rather large sums per subject to get them into a study. In other cases, the subjects are recruited via tabloid class newspaper advertising and paid to participate (the guinea pigs). Studies that are farmed out to commercial firms take over everything including recruiting and frequently operate in small countries with minimal supervision and strong incentives to please the sponsoring company. One can also question for many studies the required table in the published report comparing all the characteristics of the placebo and treated groups. Some study designs have a pre-trial period where subjects deemed unsatisfactory in terms of the desired outcome are disqualified, an obvious source of potential bias impacting the application of the results to the intended end-user population of the drug or procedure in question. Modern statistical software allows those doing the statistical analysis to effortlessly try a large number of approaches, variations and assumptions and view the end results that will appear in the final publication. Most readers of the results will be unable to judge if the most appropriate statistical approach was used or if there was bias in statistical manipulation. THE META-ANALYSIS - A PLATINUM STANDARD? If the RCT is the gold standard strived for in EBM, then the meta-analysis (adjusted or weighted pooling or grouping of study results) of RCTs has been called the Platinum Standard.5 Meta-analyses are held in high regard and have a profound impact on guidelines and views of the merits of a therapy. But the meta-analysis is not a simple exercise. The results have utility only if the studies selected are of quite similar populations, i.e. homogeneous. Prior to amalgamating the data, the reviewers must select and then assign weights to the studies according to a set of guidelines. These weights can, for some analyses, determine whether the results favour treatment or placebo, or treatment A vs. treatment B. There is an unsettling and disturbing objectivity problem in this process. This was recently demonstrated when a high level of variability was found when a group of raters from the same department were given 165 trials and asked to rate according to the Cochrane Collaboration guidelines for bias assessment.6 Not only was the inter-rater agreement poor, the assignments significantly impacted the results of the subsequent meta-analysis. In addition, reviewers cannot arbitrarily exclude studies. If studies with a negative outcome that have been suppressed or concealed so it is impossible to consider them, this invalidates the whole analysis. This was the case with meta-analyses on the efficacy of antidepressants. When they were repeated after the FDA revealed and provided negative study results that qualified for inclusion but were suppressed, the beneficial effect disappeared completely except for very severe depression, a result that caused quite a stir in the halls of psychiatry.7,8 The point is that meta-analyses are not simply a statistical tool for improving or achieving acceptable significance by combining studies, they represent an exercise that offers its own opportunity for bias and lack of objectivity, which undermines the credibility of their position as the platinum standard of EBM. Anyone can do the calculations by simply purchasing one of a number of commercial computer programs, but then the challenge begins. The above considerations become more critical when a large number of studies of variable size, most of which fail to be statistically significant, yield through meta-analysis a small effect size (e.g. an odds ratio of say 0.9 for a protective benefit). Is it clinically significant? BMJ EXPOSES HUGE PROBLEMS IN CLINICAL STUDIES Last year the British Medical Journal put out a call for papers concerning extent, causes and consequences of unpublished evidence from clinical trials (not infrequently with negative or null results or showing too many adverse side effects). On January 3 and 4 of this year the results were published online. Lehman and Loder in their editorial9 review the highlights of this cluster of papers, prefacing their remarks by the comment that it may come as a shock to clinicians that the evidence from clinical trials they depend on for guidance is not necessarily relevant, reliable or properly disseminated. In fact, a large proportion of evidence from human trials is unreported and much of what is reported is done so inadequately. One study incorporated unpublished data into existing meta-analyses of nine drugs approved by the FDA between 2001 and 2002. This reanalysis produced identical results of efficacy in only 7% of studies and the remainder were equally split between showing greater or lesser benefit.10 Lehman and Loder comment that most of the interventions currently in use and recommended in guidelines are based on trials carried out before mandatory pre-trial registration, and they describe the reported difficulties investigators have in acquiring a complete set of data, where searching for and obtaining data from unpublished trials can take several years. One paper examined the impact of the requirement that, as of 2005 prior trial registration became a condition of later publication in many journals, and the additional requirement for publicly funded studies in the US, a summary report must be published within 30 months of study completion. Ross et al11 found that, for publically funded studies between 2005 and 2008, more than half of completed trials failed to report within the required time. Another study found compliance with a regulation of 2007 that changed the time to 12 months for a summary of completed studies was a dismal 22%.12 The editorialists also comment on the interesting phenomenon that using the search item "randomized controlled trial" misses a large number of papers indexed by Medline (PubMed) which adds to the difficulties of searching for trials when doing systematic reviews and meta-analyses. Their overall conclusion: "What is clear from the linked studies (this BMJ set) is that past failures to ensure proper regulation and registration of clinical trials, and a current culture of haphazard publication and incomplete data disclosure, make proper analysis of the harms and benefits of common interventions almost impossible for systematic reviewers. Our patients will have to live with the consequences of these failures for many years to come. The evidence we publish shows that the current situation is a disservice to research participants, patients, health systems, and the whole endeavour of clinical medicine." Not a good report card but consistent with a considerable body of earlier critical literature and highly relevant to the issue of the trust one can place in the evidence used in EBM.
|
REFERENCES
|
![]() |
![]() |