Cohort Study

A cohort study is a study in which researchers compare two groups over a period of time. At the start of the study, one of the groups has a particular condition or receives a particular treatment, and the other does not. At the end of a certain amount of time, researchers compare the two groups to see how they did. 

Control Group

The control group of a study is a group that receives a treatment other than the one being studied (for instance, a placebo pill that looks identical to the medication being studied but that has no active ingredients). Control groups are necessary since we need to be able to compare the results of the treatment being studied to available alternatives. For instance, the fact that 90% of all patients taking treatment A for condition B recovered within one year tells us nothing unless we know the percentage of patients who recover from condition B within one year with no treatment at all!   Where placebos cannot be used, the control group is defined as to "standard" therapy or to the use of another intervention.

 

Systematic Reviews and Meta-Analyses

Important medical questions are typically studied more than once, often by different research teams in different locations.

A systematic review is a comprehensive survey of a topic in which all of the primary studies of the highest level of evidence  have been systematically identified, appraised and then summarized according to an explicit and reproducible methodology.
A meta-analysis is a survey in which the results of all of the included studies are similar enough statistically that the results are combined and analyzed as if it was one study.   In general a good systematic review or meta-analysis will be a better guide to practice than an individual article.

Pitfalls specific to meta-analysis include:

  1. It's rare that the results of the different studies precisely agree, and often the number of patients in a single study is not large enough to come up with a decisive conclusion.
  2. If the authors are interested in supporting a particular conclusion, they can include studies that support that conclusion and omit studies that do not. Do the authors explain in their paper exactly on what basis they included studies , and do their reasons make sense?
  3. Studies that show some kind of positive effect tend to be published more often than those that do not. This means that if the authors include only published studies, several weak positive studies may seem to add up to a strong positive result. Do weak negative studies exist? This effect is known as Publication bias.

Randomized Controlled Studies

A randomized controlled study is one in which:

  1. There are two groups, one treatment group and one control group. The treatment group receives the treatment under investigation, and the control group receives either no treatment or some standard default treatment.
  2. Patients are randomly assigned to all groups.
Assigning patients at random reduces the risk of bias and increases the probability that differences between the groups can be attributed to the treatment.

Having a control group allows us to compare the treatment with alternative choices. For instance, the statement that a particular medication cures 40% of cases tells us very little unless we also know how many cases get better on their own! (Or with a different treatment).

 With certain research questions, randomized controlled studies cannot be done for ethical reasons. For instance, it would be unethical to attempt to measure the effect of smoking on health by asking one group to smoke two packs a day and another group to abstain, since the smoking group would be subject to unnecessary harm.

 Randomized controlled trials are the standard method of answering questions about the effectiveness of different therapies. If you have a therapy question, first look for a randomized controlled trial, and only go on to look for other types of studies if you don't find one.
 


The Randomized Controlled Double Blind Method

 

The Double Blind Method

A double blind study is one in which neither the patient nor the physician knows whether the patient is receiving the treatment of interest or the control treatment.

 For example, studies of treatments that consist essentially of taking pills are very easy to do double blind - the patient takes one of two pills of identical size, shape, and color, and neither the patient nor the physician needs to know which is which.

 A double blind study is the most rigorous clinical research design because, in addition to the randomization of subjects which reduces the risk of bias, it can eliminate the placebo effect which is a further challenge to the validity of a study.

 The placebo effect could be thought of in this way:

 

  1. Patients who believe they are receiving a new experimental treatment tend to be more optimistic about the outcome. This means that, when asked, they tend to minimize health problems and give more weight to positive effects. They also tend to take better care of themselves and comply better with the conditions of the experiment. There is also substantial evidence that, independent of all this, patients who have positive beliefs about their treatment do better than patients who do not. In many situations, the placebo effect is at least as strong as any objective effects of the treatment!

     

  2. Doctors who believe that a patient is receiving a new experimental treatment tend to be more optimistic about that patient's chances, evaluate their state of health more favorably, and communicate positive expectations to the patients, who in turn try to get better so as to prove their doctor right! 

     

    Cohort Studies

    A Cohort Study is a study in which patients who presently have a certain condition and/or receive a particular treatment are followed over time and compared with another group who are not affected by the condition under investigation.

    For instance, since a randomized controlled study to test the effect of smoking on health would be unethical, a reasonable alternative would be a study that identifies two groups, a group of people who smoke and a group of people who do not, and follows them forward through time to see what health problems they develop.

     Cohort studies are not as reliable as randomized controlled studies, since the two groups may differ in ways other than in the variable under study. For example, if the subjects who smoke tend to have less money than the non-smokers, and thus have less access to health care, that would exaggerate the difference between the two groups.

     The main problem with cohort studies, however, is that they can end up taking a very long time, since the researchers have to wait for the conditions of interest to develop. Physicians are, of course, anxious to have meaningful results as soon as possible, but another disadvantage with long studies is that things tend to change over the course of the study. People die, move away, or develop other conditions, new and promising treatments arise, and so on. Even so, cohort studies are generally preferred to case control studies, since they involve far fewer statistical problems and generally produce more reliable answers.
     



     

    Case Control Studies

    Case control studies are studies in which patients who already have a certain condition are compared with people who do not.

     For example: a study on which lung cancer patients are asked how much they smoked in the past and the answers are compared with a sample of the general population would be a case control study.

     Case control studies are less reliable than either randomized controlled trials or cohort studies. Just because there is a statistical relationship between two conditions does not mean that one condition actually caused the other. For instance, lung cancer rates are higher for people without a college education (who tend to smoke more), but that does not mean that someone can reduce his or her cancer risk just by getting a college education.

    The main advantages of case control studies are:

     

    The first study to suggest a new medical conclusion will often be a case control study, perhaps designed to check on a hypothesis suggested by a case series. If possible, researchers will generally try to confirm the results with a randomized controlled trial or a cohort study.
     

     

    Case Series and Case Reports

    Case series and case reports consist either of collections of reports on the treatment of individual patients, or of reports on a single patient.

     For example: one of your patients has a condition that you have never seen or heard of before and you are uncertain what to do. A search for case series or case reports may reveal information that will assist in a diagnosis. However, for any reasonably well-known condition you should be able to get better evidence. Case series and case reports, since they use no control group with which to compare outcomes, have no statistical validity .
     

     

    Some Simple Statistics

    One of the major problems in evaluating studies is that of deciding whether the results are definite enough to indicate an effect other than chance. Things don't always come out exactly even: for example, if you tossed a coin 10 times and it came down heads 6 times or even 7, you would probably still regard it as a fair coin. However, if it came down heads 10 times out of 10, or even 9 out of 10, you would probably suspect it of being biased. It's the same thing with studies: you want to find out if the results are far enough away from those you'd expect to get by chance alone to suggest that there is some real effect.

     The statistical analysis consists of mathematical calculations of how likely or unlikely it would be for the results of the study to come about purely by chance. In the coin example, the chance of a coin coming down heads 10 times in a row is 1 in 2x2x2x2x2x2x2x2x2x2, or 1 in 1024 (less than 1 in a thousand). This is quite a small number, which is why we conclude that the coin is probably biased.

     The more data you have, the more weight they can give to the conclusions. For instance, if a coin comes down heads 7 times out of 10, you really have no idea whether it is biased or not. If it comes down heads 700 times out of 1000, you can be pretty sure it's biased. This is one of the main reasons why meta-analyses are done: by adding together data from different studies you can get a large enough sample to come up with a fairly definite conclusion.

     In general, an effect that shows up in a study is regarded as being significant (i.e. not just random) if the probability of it happening by chance is less than 0.05 (1 chance in 20). The probability of something happening is called p in statistics, and the study will generally give a value of p for any conclusions they draw. For instance, if the article says that the treatment is better than the control with p < 0.00001, that is a very strong conclusion indeed. If the conclusion is with p < 0.25, that would be regarded by most people as too weak to rely on.
     


    IMPORTANT CONCEPTS

    THERAPY




    When computing results for a clinical trial, researchers compare the number of people who experience the studied outcome against those who avoided it in both arms of the study.  They often set up a table like the following to keep track of the results.  For example, if the outcome is death at 12 months, the table would look like this:


     

     
    # Dead at 12 months #  Alive at 12 months
     
    Exposed to therapy
     
    300
     
    700
     
    Not exposed to therapy
     
    800
     
    200
     


    Risk
    The risk of the studied outcome is the number of people who experienced the outcome as a percentage of all people in that group.  For those exposed to the the therapy, the risk of death at 12 months is 300/(300+700) or 30%.  The risk of death at 12 months of those not exposed to the therapy is 800/(800+200) or 80%.


    Relative Risk Reduction
    The Relative Risk Reduction (RRR) achieved by a particular therapy is the difference between the two risks as a percentage of the risk of those not exposed to the therapy.  In the example above, the relative risk reduction would be (80%-30%)/80% or 62.5%.


    Absolute Risk Reduction
    The Absolute Risk Reduction (ARR) achieved by a particular therapy is simply the difference in the two risks.  In the above example, the absolute risk reduction is 80%-30% or 50%.  


    Relative vs. Absolute Risk Reducation
    Because it is presented as a percentage of the control group,  Relative Risk Reduction does not take into account the size of the initial risk and the actual reduction.  In the example below, while the actual reducations vary greatly, the Relative Risk Reduction remains the same.

     
    Risk in Control. Group
     
    Risk in Experimental Group
     
    Relative Risk Reducation
     
    Absolute Risk Reduction
     
    70%
     
    35%
     
    50%
     
    35%
     
    7%
     
    3.5%
     
    50%
     
    3.5%
     
    0.7%
     
    0.35%
     
    50%
     
    0.35%
     
     
    Although the Absolute Risk Reduction is a more accurate report of the effect of the intervention, it's harder for people to understand the distinction between the two.  So EBM pratctitioners came up with another statistic to more clearly represent what the statistics mean.


    Number Needed to Treat
    The Number Needed to Treat (NNT) is the reciprocal of the Absolute Risk Reduction.  If the ARR is expressed as a decimal, the Number Needed to Treat is 1/ARR; if the ARR is expressed as a percentage, the Number Needed to Treat is 100/ARR.  In the above example, the Number Needed to Treat is 100%/50% or 2.  The Number Needed to Treat tells how many people you need to treat in order to see the desired outcome in one additional patient .  In this example, you would need to treat 2 patients to prevent one additional patient from dying at 12 months.  The NNT gives you a clear number to use to balance benefits against risks.  If you only need to treat 2 patients to see a benefit, you may feel comfortable ignoring extremely mild side effects.  However, if you need to treat 2000 patients to see a benefit with a drug that has very severe side effects, you may not feel comfortable recommending this treatment.
     

     

    CONFIDENCE INTERVALS




    You will often see the results of a study reported in a format of  45%(CI: 40% - 50%).   The first number is the mean of the results.  The numbers in parenthesis represent the confidence interval.  A confidence interval tells us the range that includes the true relative risk reduction 95% of the time.  The mean result we can expect is, for example, a 45% reduction in risk of mortality.  If we were to conduct the same study an infinte number of times, we can expect that 95% of the time the mean reduction in mortality will fall somewhere between 40% and 50%.  


    There are two qualities that a confidence interval may have that could bring the results into question.  One type of confidence interval is one that is particularly wide.  If the mean, again, is a 45% reduction in the risk of mortality but the confidence interval runs from 1% to 99%, this tells us that the researchers had a very small sample size and thus the results had a much wider spread.  A fairly large sample produces much narrower results.

    In daily conversation, we say that a penny, when flipped, has a 50% chance of landing heads-side up.  However, when asked to bet on whether a single penny would in fact land heads-side up 5 times out of ten, you'd be a little more hesitant to back up that statistic.  10 times simply isn't enough to produce the "true" 50% likelihood.  However, if the penny were flipped 1,000 times you might come closer to that 50% probability, although there is a 95% chance that it will in fact fall within the 45-55% probability.  The larger the number of chances, the closer to real likelihood the probability becomes.

    When there is a wide confidence interval, you will need to take into consideration the characteristics of your specific patient and determine if  your patient is more likely to fall closer to the lower end of the interval or the higher end.

    It may occur that the confidence interval crosses 0.  With a 45% mean reduction in mortality, you may experience a confidence interval that runs between -2 and 53.  In addition to being a fairly wide confidence interval (and this a small sample), the crossing of 0 tells us that the results are not statistically significant.  While this is interesting to biostatisticians, medical practitioners are more interested in results that are clinically significant.  Results can be clinically significant without being statistically significant.  A negative confidence interval tells us that the results were opposite the expected outcome.  While there is a mean probability of 45% reduction in mortality and no greater than a 53% chance, it is also true that for some subset of the population studied, there is a 2% chance of an increase in mortality.  

    Again, it falls to your own clinical judgment to determine whether your specific patient's characteristics lean more towards the increased mortality or decreased mortality as described in the study.
     

     

    Concepts Used in Diagnosis Studies

     

    Sensitivity

    The sensitivity of a test is the proportion of patients having the condition who will test positive on the test. So if an HIV antibody test has a sensitivity of 99%, out of every hundred patients who take the test and have HIV antibodies, 99 will test positive and one will test negative (a false negative ).

     To put it another way, the higher the sensitivity of a test, the lower the proportion of false negative results it gets, and the more of its intended target population it catches.

    Specificity

    The specificity of a test is the proportion of patients who are disease free that test negative for the disease.  In other words if a test has a high specificity (negativity in health) then a  positive result rules in the diagnosis.

    A good test is both highly sensitive and highly specific

    That is to say, it finds almost all patients who have the condition and hardly any patients who do not have the condition are falsely identified.

     The exact combination of sensitivity and specificity required of a test will depend on the relative disadvantages of false positive results (unnecessary anxiety, unneeded or harmful treatment or further tests, etc.) and false negative results (treatable disease going untreated for lack of diagnosis).