Learn to separate the wheat from the chaff and save valuable time.
Dr Grimes is a Clinical Professor in the Department of Obstetrics and Gynecology, UNC School of Medicine, Chapel Hill, North Carolina.
Dr Schulz is a Clinical Professor in the Department of Obstetrics and Gynecology, UNC School of Medicine, Chapel Hill, North Carolina.
Busy clinicians face the daunting challenge of keeping up with the medical literature. The sheer volume of published reports poses one obstacle; PubMed added about 1.2 million citations in 2014 alone. One author estimates the reading time needed to keep up in primary medicine to be 628 hours per month1 (of a maximum 744 hours). Another major obstacle is that medical school and residency often do not equip clinicians to be critical readers. Indeed, most physicians admit that they cannot read the literature intelligently.2 Here, we share some practical tips from our 4 decades of reading and contributing to the medical literature.
Of published articles on any topic, only a small minority are relevant for clinical practice.3 Some suggest triaging the literature using 3 questions: Is this article relevant to my practice? What was the aim of the study? and Are the results credible?
Most published articles in obstetrics and gynecology journals are observational, not experimental.4 All observational studies have built-in bias,5 so readers should use caution in interpreting them. The 5-point checklist shown in Table 1 can be helpful. As described below, we recommend asking the first 3 questions about a study and pausing to reflect before considering the final 2 questions. An additional set of 5 questions, which we will also discuss, applies to randomized controlled trials (RCTs).
Question 1: Were the groups being compared similar at the study start?
In a cohort study (which progresses from exposure to outcome), the exposed and unexposed groups should be similar in all important respects except for having the exposure or not. In a case-control study (which looks back in time from outcome to exposure), the cases and controls should be similar except for having or not having the disease in question.
In the 1990s, observational studies consistently found estrogen therapy in menopause to be associated with an important reduction in the risk of cardiovascular disease. This was subsequently shown to be due to selection bias (“healthy-user” effect): Women who chose to use estrogen were healthier than those who did not. Only an RCT could avoid this bias.6
Question 2: Were the outcomes for the comparison groups determined in the same way?
In case-control studies that rely on only participant recall of past exposures, recall bias (a type of information bias) often is insurmountable. Cases (ie, people who are sick) are motivated to search their memories for potentially relevant exposures, while healthy people (controls) are not. Under-reporting of exposures among healthy controls is routine and produces a spurious increase in risk with the exposure. For example, when a case-control study of abortion and breast cancer was done in 2 ways (government health records vs personal interviews), statistically significant under-reporting of prior abortions was documented among the healthy controls, but not among women with breast cancer.7
Question 3: Has another factor caused a blurring of effects?
A confounding factor is one related to both the exposure and the outcome but not involved in the causal pathway. The investigators want to examine the potential relationship between an exposure and an outcome, but a third factor may be involved, which confounds the results. For example, in the 1970s the combined oral contraceptive was thought to be associated with pituitary adenoma development. More sophisticated studies found the association due to the confounding influence of irregular bleeding.8 Irregular bleeding led to use of pills for cycle control, and irregular bleeding was related to the elevated prolactin levels produced by the tumor. This phenomenon is termed “confounding by indication.”
Pause and assess
Frequently, the results of an observational study can be explained based on one or more of the aforementioned biases: selection, information, and confounding. If so, we recommend ignoring the report and moving on. If not, then you should consider the role of chance and the strength of the association.
Question 4: Do the results reflect chance?
If a study reports no statistically significant difference between groups, a reader should ask whether the sample size (and thus power) was sufficient to find a difference if it existed. The opposite problem occurs when the study size is huge. As has been wryly noted, a tiny sample size can prove nothing while a massive sample size can prove anything.
Indeed, the growing use of administrative databases for epidemiologic research has created pseudo-epidemics and spawned product liability litigation.9 When the number of participants is large, even trivial differences become statistically significant with tiny P values.10 Weak associations are often due to bias, even if the difference is statistically significant with a tiny P value. Stated a different way, large administrative database studies can give the wrong answers with precision, that is, they can be precisely wrong. Being correct with some imprecision is preferable.
Question 5: How strong is the association?
In a cohort study, the usual measure is the relative risk (risk ratio), which is the outcome rate in the exposed group divided by that in the unexposed group. In a case-control study, the measure is the odds ratio. When the outcome is rare (less than 5% or so), the odds ratio becomes a good proxy for the relative risk.
Strong associations are more credible than weak ones. Residual bias in observational studies can easily produce weak associations. Hence, a rule of thumb for cohort studies is that the relative risk should be > 2.0 (or its reciprocal, < 0.5) to make the study worthy of attention.11 Relative risks from 0.5 to 2.0 often are considered “noise” and not “signal.” Because case-control designs tend to be more vulnerable to bias, more caution is needed. Consensus suggests that the odds ratio for case-control studies should be > 3.0 (or its reciprocal, < 0.33) to make the results worthy of attention. Using these rules of thumb, a reader will quickly discover that weak associations abound.
At the pinnacle of the clinical research hierarchy stand RCTs. If well-designed and implemented, they provide the best chance for finding the truth in the population. Regrettably, many reported trials do not follow the rules of conduct, which allows bias to creep in.12 When reading a trial report, you should consider 5 questions.
Question 1: Was the allocation to treatment groups truly random?
Examples of proper randomization techniques include a table of random numbers or a computer random number generator (eg, www.randomization.com and www.random.org). Non-random techniques (sometimes claimed to be random) include alternating assignments and use of hospital chart number, day of the week, or birth date to make assignments.
Question 2: Were participants and those involved with recruitment unaware of the upcoming treatment assignments?
Foreknowledge of the next assignment invites subversion of the randomization.5 For instance, if a concerned clinician knows that the next patient will be assigned to a treatment the clinician does not favor, he or she might circumvent the randomization process by sending the patient to the laboratory. By the time the patient returns, another participant has enrolled in the trial, and now the patient gets the clinician’s preferred assignment. Selection bias results.
Questions 3: Was every participant enrolled in the trial accounted for?
This should be evident in a flow diagram13 of the trial. Was there differential loss to follow-up in the 2 groups? Was the loss to follow-up excessive? Some researchers have proposed a “5-and-20” guideline: losses of 5% or less are unlikely to invalidate results. In contrast, losses higher than 20% undermine a trial’s credibility. More recently, we have suggested that the loss to follow-up proportion should not exceed the outcome event rate. For a trial with a 3% outcome rate, losses should not exceed this percentage.5
Question 4: Was every participant analyzed in the group to which she was initially assigned, even if she did not comply with the intended treatment?
Investigators commonly violate this key principle. The only truly randomized comparison is that at the starting blocks. Other analyses, such as “per protocol” or “as treated” are in reality observational cohort studies, because randomization no longer exists. A common error in trials funded by pharmaceutical companies is defining “intention-to-treat” as any participant who took at least one dose of the drug and had follow-up.
Question 5: Did the report follow the internationally accepted guidelines for reporting?
Researchers who are aware of proper methods for conducting and reporting studies produce better science. The EQUATOR network14 (www.equator-network.org) has compiled the established guidelines for researchers around the world. For observational reports, the STROBE guidelines15 apply. The guidelines have a checklist of information deemed fundamentally important. For RCTs, the new SPIRIT guidelines16 describe the necessary elements for the study protocol. The CONSORT guidelines13 have a checklist of information needed for the trial report. For evaluations of a test, the STARD guidelines17 apply. Again, a checklist guides authors through the analysis and reporting.
For systematic reviews, the PRISMA guidelines18 should be used. While most journals endorse these guidelines, enforcement is not uniform. Readers who find reports citing and using the relevant guidelines can assume that the authors were aware of prevailing research standards (Table 2).
Physicians get behind in their reading on the first day of medical school and never get caught up. However, most published reports are irrelevant to day-to-day practice. Most published findings are false.19 Most true associations are exaggerated.20 In addition, a small but growing proportion of all publications is fraudulent.21 Reading both selectively and critically will help you through the “information jungle.”
Read the medical literature carefully.22 If you want to hone your critical reading skills, the books in Table 3 are useful guides. Much of obstetrics and gynecology today rests on a shaky scientific foundation, including potentially dangerous, unproved surrogate endpoints.23 Caution is especially important for new treatments and technologies such as robotic surgery.24 Often, despite initial enthusiasm, the emperor is later found to have been strutting around naked.25
1. Alper BS, Hand JA, Elliott SG, et al. How much effort is needed to keep up with the literature relevant for primary care? J Med Libr Assoc. 2004;92(4):429-437.
2. Olatunbosun OA, Edouard L, Pierson RA. Physicians' attitudes toward evidence based obstetric practice: a questionnaire survey. BMJ. 1998;316(7128):365-366.
3. Miser WF. Finding truth from the medical literature: how to critically evaluate an article. Prim Care. 2006;33(4):839-862, vi.
4. Funai EF, Rosenbush EJ, Lee MJ, Del Priore G. Distribution of study designs in four major US journals of obstetrics and gynecology. Gynecol Obstet Invest. 2001;51(1):8-11
5. Schulz KF, Grimes DA. The Lancet handbook of essential concepts in clinical research. London: Elsevier; 2006.
6. Writing Group for the Women's Health Initiative Investigators. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results From the Women's Health Initiative randomized controlled trial. JAMA. 2002 2002;288(3):321-333.
7. Lindefors-Harris BM, Eklund G, Adami HO, Meirik O. Response bias in a case-control study: analysis utilizing comparative data concerning legal abortions from two independent Swedish studies. Am J Epidemiol. 1991;134(9):1003-1008.
8. Shy KK, McTiernan AM, Daling JR, Weiss NS. Oral contraceptive use and the occurrence of pituitary prolactinoma. JAMA. 1983;249(16):2204-2207.
9. Grimes DA. Epidemiologic research using administrative databases: garbage in, garbage out. Obstet Gynecol. 2010;116(5):1018-1019.
10. Lidegaard O, Nielsen LH, Skovlund CW, Skjeldestad FE, Lokkegaard E. Risk of venous thromboembolism from use of oral contraceptives containing different progestogens and oestrogen doses: Danish cohort study, 2001-9. BMJ. 2011;343:d6423.
11. Grimes DA, Schulz KF. False alarms and pseudo-epidemics: the limitations of observational epidemiology. Obstet Gynecol. 2012;120(4):920-927.
12. Schulz KF, Chalmers I, Grimes DA, Altman DG. Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecology journals. JAMA. 1994;272(2):125-128.
13. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332.
14. Simera I, Moher D, Hirst A, Hoey J, Schulz KF, Altman DG. Transparent and accurate reporting increases reliability, utility, and impact of your research: reporting guidelines and the EQUATOR Network. BMC Med. 2010;8:24.
15. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453-1457.
16. Chan AW, Tetzlaff JM, Altman DG, et al. SPIRIT 2013 statement: defining standard protocol items for clinical trials. Ann Intern Med. 2013;158(3):200-207.
17. Bossuyt PM, Reitsma JB, Bruns DE, et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Radiol. 2003;58(8):575-580.
18. Moher D, Liberati A, Tetzlaff J, Altman DG. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.
19. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.
20. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008;19(5):640-648.
21. Steen RG. Retractions in the scientific literature: is the incidence of research fraud increasing? J Med Ethics. 2011;37(4):249-253.
22. von Elm E, Egger M. The scandal of poor epidemiological research. BMJ. 2004;329(7471):868-869.
23. Grimes DA, Schulz KF, Raymond EG. Surrogate end points in women's health research: science, protoscience, and pseudoscience. Fertil Steril. 2010;93(6):1731-1734.
24. Liu H, Lawrie TA, Lu D, Song H, Wang L, Shi G. Robot-assisted surgery in gynaecology. Cochrane Database Syst Rev. 2014;12:CD011422.
25. Grimes DA. Technology follies. The uncritical acceptance of medical innovation. JAMA. 1993;269(23):3030-3033.