Important medical questions are typically studied more than once, often by different research teams in different locations.
A systematic review is a comprehensive survey of a topic in which all of the
primary studies of the highest level of evidence have been systematically
identified, appraised and then summarized according to an explicit and
A meta-analysis is a survey in which the results of all of the included studies are similar enough statistically that the results are combined and analyzed as if it was one study. In general a good systematic review or meta-analysis will be a better guide to practice than an individual article.
Pitfalls specific to meta-analysis include:
A randomized controlled study is one in which:
Having a control group allows us to compare the treatment with alternative choices. For instance, the statement that a particular medication cures 40% of cases tells us very little unless we also know how many cases get better on their own! (Or with a different treatment).
With certain research questions, randomized controlled studies cannot be done for ethical reasons. For instance, it would be unethical to attempt to measure the effect of smoking on health by asking one group to smoke two packs a day and another group to abstain, since the smoking group would be subject to unnecessary harm.
Randomized controlled trials are the standard method of answering questions
about the effectiveness of different therapies. If you have a therapy question,
first look for a randomized controlled trial, and only go on to look for other
types of studies if you don't find one.
A double blind study is one in which neither the patient nor the physician knows whether the patient is receiving the treatment of interest or the control treatment.
For example, studies of treatments that consist essentially of taking pills are very easy to do double blind - the patient takes one of two pills of identical size, shape, and color, and neither the patient nor the physician needs to know which is which.
A double blind study is the most rigorous clinical research design because, in addition to the randomization of subjects which reduces the risk of bias, it can eliminate the placebo effect which is a further challenge to the validity of a study.
The placebo effect could be thought of in this way:
A Cohort Study is a study in which patients who presently have a certain condition and/or receive a particular treatment are followed over time and compared with another group who are not affected by the condition under investigation.
For instance, since a randomized controlled study to test the effect of smoking on health would be unethical, a reasonable alternative would be a study that identifies two groups, a group of people who smoke and a group of people who do not, and follows them forward through time to see what health problems they develop.
Cohort studies are not as reliable as randomized controlled studies, since the two groups may differ in ways other than in the variable under study. For example, if the subjects who smoke tend to have less money than the non-smokers, and thus have less access to health care, that would exaggerate the difference between the two groups.
The main problem with cohort studies, however, is that they can end up
taking a very long time, since the researchers have to wait for the conditions
of interest to develop. Physicians are, of course, anxious to have meaningful
results as soon as possible, but another disadvantage with long studies is
that things tend to change over the course of the study. People die, move
away, or develop other conditions, new and promising treatments arise, and so
on. Even so, cohort studies are generally preferred to
case control studies,
since they involve far fewer statistical problems and generally produce more
Case control studies are studies in which patients who already have a certain condition are compared with people who do not.
For example: a study on which lung cancer patients are asked how much they smoked in the past and the answers are compared with a sample of the general population would be a case control study.
Case control studies are less reliable than either randomized controlled trials or cohort studies. Just because there is a statistical relationship between two conditions does not mean that one condition actually caused the other. For instance, lung cancer rates are higher for people without a college education (who tend to smoke more), but that does not mean that someone can reduce his or her cancer risk just by getting a college education.
The main advantages of case control studies are:
Case series and case reports consist either of collections of reports on the treatment of individual patients, or of reports on a single patient.
For example: one of your patients has a condition that you have never seen
or heard of before and you are uncertain what to do. A search for case series
or case reports may reveal information that will assist in a diagnosis.
However, for any reasonably well-known condition you should be able to get
better evidence. Case series and case reports, since they use no control group
with which to compare outcomes, have no statistical validity .
The statistical analysis consists of mathematical calculations of how likely or unlikely it would be for the results of the study to come about purely by chance. In the coin example, the chance of a coin coming down heads 10 times in a row is 1 in 2x2x2x2x2x2x2x2x2x2, or 1 in 1024 (less than 1 in a thousand). This is quite a small number, which is why we conclude that the coin is probably biased.
The more data you have, the more weight they can give to the conclusions. For instance, if a coin comes down heads 7 times out of 10, you really have no idea whether it is biased or not. If it comes down heads 700 times out of 1000, you can be pretty sure it's biased. This is one of the main reasons why meta-analyses are done: by adding together data from different studies you can get a large enough sample to come up with a fairly definite conclusion.
In general, an effect that shows up in a study is regarded as being
significant (i.e. not just random) if the probability of it happening by
chance is less than 0.05 (1 chance in 20). The probability of something
happening is called p in statistics, and the study will generally
give a value of p for any conclusions they draw. For instance, if the
article says that the treatment is better than the control with p < 0.00001,
that is a very strong conclusion indeed. If the conclusion is with p < 0.25,
that would be regarded by most people as too weak to rely on.
When computing results for a clinical trial, researchers compare the number of people who experience the studied outcome against those who avoided it in both arms of the study. They often set up a table like the following to keep track of the results. For example, if the outcome is death at 12 months, the table would look like this:
The risk of the studied outcome is the number of people who experienced the outcome as a percentage of all people in that group. For those exposed to the the therapy, the risk of death at 12 months is 300/(300+700) or 30%. The risk of death at 12 months of those not exposed to the therapy is 800/(800+200) or 80%.
Relative Risk Reduction
The Relative Risk Reduction (RRR) achieved by a particular therapy is the difference between the two risks as a percentage of the risk of those not exposed to the therapy. In the example above, the relative risk reduction would be (80%-30%)/80% or 62.5%.
Absolute Risk Reduction
The Absolute Risk Reduction (ARR) achieved by a particular therapy is simply the difference in the two risks. In the above example, the absolute risk reduction is 80%-30% or 50%.
Relative vs. Absolute Risk Reducation
Because it is presented as a percentage of the control group, Relative Risk Reduction does not take into account the size of the initial risk and the actual reduction. In the example below, while the actual reducations vary greatly, the Relative Risk Reduction remains the same.
Although the Absolute Risk Reduction is a more accurate report of the effect of the intervention, it's harder for people to understand the distinction between the two. So EBM pratctitioners came up with another statistic to more clearly represent what the statistics mean.
Number Needed to Treat
The Number Needed to Treat (NNT) is the reciprocal of the Absolute Risk Reduction. If the ARR is expressed as a decimal, the Number Needed to Treat is 1/ARR; if the ARR is expressed as a percentage, the Number Needed to Treat is 100/ARR. In the above example, the Number Needed to Treat is 100%/50% or 2. The Number Needed to Treat tells how many people you need to treat in order to see the desired outcome in one additional patient . In this example, you would need to treat 2 patients to prevent one additional patient from dying at 12 months. The NNT gives you a clear number to use to balance benefits against risks. If you only need to treat 2 patients to see a benefit, you may feel comfortable ignoring extremely mild side effects. However, if you need to treat 2000 patients to see a benefit with a drug that has very severe side effects, you may not feel comfortable recommending this treatment.
You will often see the results of a study reported in a format of 45%(CI: 40% - 50%). The first number is the mean of the results. The numbers in parenthesis represent the confidence interval. A confidence interval tells us the range that includes the true relative risk reduction 95% of the time. The mean result we can expect is, for example, a 45% reduction in risk of mortality. If we were to conduct the same study an infinte number of times, we can expect that 95% of the time the mean reduction in mortality will fall somewhere between 40% and 50%.
There are two qualities that a confidence interval may have that could bring the results into question. One type of confidence interval is one that is particularly wide. If the mean, again, is a 45% reduction in the risk of mortality but the confidence interval runs from 1% to 99%, this tells us that the researchers had a very small sample size and thus the results had a much wider spread. A fairly large sample produces much narrower results.
In daily conversation, we say that a penny, when flipped, has a 50% chance of landing heads-side up. However, when asked to bet on whether a single penny would in fact land heads-side up 5 times out of ten, you'd be a little more hesitant to back up that statistic. 10 times simply isn't enough to produce the "true" 50% likelihood. However, if the penny were flipped 1,000 times you might come closer to that 50% probability, although there is a 95% chance that it will in fact fall within the 45-55% probability. The larger the number of chances, the closer to real likelihood the probability becomes.
When there is a wide confidence interval, you will need to take into consideration the characteristics of your specific patient and determine if your patient is more likely to fall closer to the lower end of the interval or the higher end.
It may occur that the confidence interval crosses 0. With a 45% mean reduction in mortality, you may experience a confidence interval that runs between -2 and 53. In addition to being a fairly wide confidence interval (and this a small sample), the crossing of 0 tells us that the results are not statistically significant. While this is interesting to biostatisticians, medical practitioners are more interested in results that are clinically significant. Results can be clinically significant without being statistically significant. A negative confidence interval tells us that the results were opposite the expected outcome. While there is a mean probability of 45% reduction in mortality and no greater than a 53% chance, it is also true that for some subset of the population studied, there is a 2% chance of an increase in mortality.
Again, it falls to your own clinical judgment to determine whether your specific patient's characteristics lean more towards the increased mortality or decreased mortality as described in the study.
To put it another way, the higher the sensitivity of a test, the lower the proportion of false negative results it gets, and the more of its intended target population it catches.
The exact combination of sensitivity and specificity required of a test will depend on the relative disadvantages of false positive results (unnecessary anxiety, unneeded or harmful treatment or further tests, etc.) and false negative results (treatable disease going untreated for lack of diagnosis).