Chapter 16 Causation

16.1 You’ve Found a Relationship. Now What?

By this point in the book, you have learned to explore relationships between variables and test whether those relationships are statistically significant. You can produce a p-value, a correlation coefficient, a regression slope. You can report that variable A and variable B are associated — that knowing one helps predict the other.

Here is the hardest question in statistics: does A cause B?

Not “are they associated.” Not “do they move together.” But: if you changed A, would B change as a result? That is the question of causation, and answering it requires more than a p-value. It requires careful thinking about how the data was collected and what other forces might be at work.

This appendix is about that thinking. You will not write any R code here. What you will do is learn to look at a statistical result and ask: is this relationship causal, or is something else going on?

16.2 Association Is Not Causation

The most important sentence in this appendix: association does not imply causation. Just because two things happen together does not mean one causes the other. This is not a technical subtlety — it is the difference between understanding the world and being misled by it.

Consider three classic examples:

Ice cream sales and drowning deaths. In a given city, the more ice cream people buy, the more people drown. Does ice cream cause drowning? Of course not. A third variable — warm weather — drives both. In summer, people buy more ice cream and more people go swimming, so more drownings occur. The association is real. The causation is not.

Shoe size and reading ability. Among elementary school students, children with larger feet tend to read better. Does foot size cause reading skill? No. Older children have bigger feet and more years of reading instruction. Age is the hidden driver. If you gave a first-grader larger shoes, their reading would not improve.

Fire truck presence and fire damage. The more fire trucks that arrive at a scene, the more damage the fire causes. Do fire trucks make fires worse? No. Larger fires attract more fire trucks. The direction of causation is reversed: damage causes more trucks, not the other way around.

Each of these examples sounds silly once you stop to think about it, and that is exactly the point. The kinds of relationships you explore in your own research — between health behaviors, academic outcomes, social attitudes — are susceptible to the same logic. An association is the starting line, not the finish line.

16.3 Confounding: The Third Variable Problem

When a third variable influences both your explanatory variable and your response variable, producing a misleading association, that third variable is called a confounder (or lurking variable).

Here is the structure:

         Confounding Variable (Z)
               /        \
              v          v
    Explanatory (X)    Response (Y)
              \        /
               v      v
           (Spurious) Association

Z influences X. Z influences Y. The result looks like X influences Y — but the real driver is Z.

In the ice cream example, the confounder is temperature. In the shoe size example, the confounder is age. In the fire truck example, the confounder is fire size (which also reverses the direction).

How do you know if confounding might be present? Ask yourself: could some other variable reasonably explain both X and Y? If the answer is yes, you cannot conclude causation from the association alone. You need a design that controls for that confounder — either by measuring it and including it in your model (the topic of Chapter 10, Multiple Regression) or by using a study design that prevents confounding from the start.

16.4 Experiments: The Gold Standard for Causation

The most powerful tool for establishing causation is the randomized experiment. Here is what makes it work:

In an experiment, the researcher does not just observe the explanatory variable — they control it. Participants are randomly assigned to different conditions: one group receives the treatment, another group receives a placebo or alternative. Because assignment is random, the groups should be roughly equal on every other variable — age, gender, motivation, health status, everything — before the treatment is applied. The only systematic difference between groups is the treatment itself.

If the groups then differ on the response variable, you can attribute that difference to the treatment. Random assignment breaks the link between the treatment and any potential confounders, because those confounders are distributed roughly equally across groups by chance.

Example: Smoking cessation methods. Researchers want to know which method helps smokers quit most effectively: nicotine drugs, behavioral therapy, a combination, or quitting “cold turkey” with no assistance.

In an observational study, smokers choose their own method. The problem: people who choose the drug+therapy combination might be more motivated to quit, have more resources, or face different life circumstances than those who choose cold turkey. Any difference in quit rates could be due to the method — or to those pre-existing differences. You cannot tell.
In an experiment, smokers are randomly assigned to one of the four methods (even if they would have preferred a different one). Because assignment is random, motivation, resources, and all other personal characteristics are balanced across groups, on average. If the drug+therapy group has the highest quit rate, you can credibly attribute that to the method — not to the people who chose it.

This is why randomized experiments are the gold standard for causation. Random assignment severs the link between the treatment and any confounders. The only remaining explanation for a difference in outcomes is the treatment itself.

Key terms for experiments:

Term	Meaning
Random assignment	Participants are assigned to conditions by chance — a coin flip, not a choice.
Treatment group	Receives the intervention being tested.
Control group	Receives no treatment, a placebo, or the standard treatment.
Placebo	A fake treatment that looks real but has no active ingredient. Controls for the psychological effect of believing you are being treated.
Blinding	Participants do not know which group they are in (single-blind); neither participants nor researchers know (double-blind). Prevents expectation bias.

When experiments are not possible. You cannot randomly assign people to smoke or not smoke, to be a certain age, to live in a certain neighborhood, or to experience childhood trauma. For many important questions — especially in public health, sociology, and education — experiments are either unethical or impossible. That does not mean you give up on learning. It means you work with observational data and think critically about confounding.

16.5 Observational Studies: Proceed with Caution

Most of the data you work with in this book — and most data in the social and health sciences — comes from observational studies. In an observational study, the researcher records variables as they naturally occur, without intervening.

Observational studies can reveal important patterns, generate hypotheses, and (when carefully designed) provide evidence consistent with causation. But they cannot, on their own, prove causation, because the groups being compared are not created by random assignment — they exist in the world, shaped by countless forces the researcher does not control.

Types of observational studies:

Cross-sectional: Measure everything at one point in time. Example: survey 1,000 people today about their exercise habits and stress levels. You can report that exercisers report lower stress, but you cannot say exercise causes lower stress — perhaps less-stressed people simply have more energy to exercise.
Prospective (longitudinal): Follow people forward in time. Example: measure exercise habits now, then track stress levels over the next two years. If exercisers develop lower stress over time, the evidence is stronger — the cause preceded the effect. Compare with Chapter 10’s approach of adding potential confounders to a regression model.
Retrospective: Look backward. Example: ask people with lung cancer about their past smoking habits. Useful for generating hypotheses, but susceptible to recall bias — people may not accurately remember past behaviors.

The arrow of time matters. If X is measured before Y, you have a stronger case that X might cause Y than if both are measured simultaneously. But even then, confounders remain a threat. A third variable could have influenced X at the start and Y later on.

16.6 Criteria for Evaluating Causation

So how do you evaluate whether an observed relationship might be causal, especially when you only have observational data? Epidemiologists use a set of guidelines called the Bradford Hill criteria. You do not need to memorize them, but understanding the logic will sharpen how you think about your own results.

Criterion	Question to Ask
Strength	How large is the effect? A strong association (large slope, large difference in means) is harder to explain away by confounding than a weak one.
Consistency	Has the same relationship been found in different studies, with different populations, by different researchers? One study is suggestive; many studies are persuasive.
Temporality	Does the cause come before the effect? If X is measured before Y, causation is possible. If they are measured at the same time, it is ambiguous.
Dose-response	Does more X produce more Y? If people who exercise 5 hours a week report less stress than those who exercise 1 hour, who in turn report less stress than those who do not exercise at all, the gradient strengthens the causal argument.
Plausibility	Is there a reasonable mechanism? Does biology, psychology, or common sense offer an explanation for how X could cause Y?
Experiment	Has any experimental evidence supported the relationship? Even if your study is observational, results from related experiments bolster the case.
Specificity	Does X predict Y specifically, or does it predict everything? A variable that predicts many unrelated outcomes may be a marker of something broader (like socioeconomic status) rather than a specific cause.

None of these criteria, by itself, proves causation. But the more criteria a relationship satisfies, the more confident you can be that the relationship is genuine and not just a statistical mirage.

16.7 Applying This to Your Project

When you write up your research project (Chapter 12, Appendix B), your discussion of causation belongs in the Discussion or Limitations section. Here is what you should address:

State your study design clearly. Was it an experiment with random assignment? Or an observational study? If observational, was it cross-sectional, prospective, or retrospective? Your design determines what kind of causal claims you can make.
Acknowledge potential confounders. If you studied whether sleep duration predicts academic performance, what confounders might explain both? Stress? Parental involvement? Screen time? List the most plausible ones and explain why you did (or did not) control for them.
Use cautious language. Instead of “X causes Y,” write “X is associated with Y,” “X predicts Y,” or “the results are consistent with the hypothesis that X influences Y.” These phrases are honest about what your data can and cannot show.
Suggest what a future experiment might look like. Even if you cannot run it, describing what a randomized experiment would look like shows that you understand the difference between association and causation. For example: “A randomized trial could assign students to different sleep schedules and measure their subsequent test performance, controlling for baseline ability and motivation.”
Apply the Bradford Hill lens. Which criteria does your relationship satisfy? Temporality? Strength? Dose-response? Mention the ones that apply to your findings.

Example write-up (for an observational study):

This study found a significant association between sleep duration and GPA among high school students: students who reported sleeping 8+ hours per night had GPAs approximately 0.4 points higher than those sleeping fewer than 6 hours (p < 0.01). Because the study is cross-sectional, causation cannot be established — students with higher GPAs may experience less academic stress and therefore sleep better, rather than sleep causing higher grades. Potential confounders include stress, parental involvement, and extracurricular commitments. Future research using a prospective design and controlling for these confounders would strengthen the evidence for a causal relationship.

Notice the careful language: “association,” not “causes”; “cannot be established,” not “is false”; suggestions for improvement, not excuses. This is how scientists write — honest about what the data shows and what it does not.

16.8 Key Takeaways

Association is not causation. A p-value, a correlation, or a regression slope tells you two variables are related — it does not tell you one causes the other.
Confounding is the most common alternative explanation. Whenever you see a relationship, ask: could a third variable drive both X and Y?
Randomized experiments are the gold standard for causation. Random assignment ensures that groups being compared differ only on the treatment, not on pre-existing characteristics.
Observational studies can provide evidence consistent with causation, but they cannot prove it on their own. Be honest about the limitations in your write-up.
The Bradford Hill criteria give you a framework for evaluating evidence. Strength, consistency, temporality, dose-response, plausibility, experimental support, and specificity — the more you satisfy, the stronger your case.
Language matters. Use “associated with,” “predicts,” or “is consistent with” rather than “causes” when working with observational data. Reserve causal language for conclusions supported by experimental evidence.

Statistics gives you tools to discover patterns in data. Thinking clearly about causation gives you the wisdom to interpret those patterns honestly. Both skills together are what make you a capable researcher.