Chapter 12 Epilogue: Write Up Your Analysis
You made it. Twelve chapters ago, you typed your first line of R code: 1 + 1. Since then, you have wrangled messy survey data from over 43,000 adults, built frequency tables, charted every kind of variable relationship imaginable, run statistical tests from t-tests to multiple regression, and interpreted dozens of p-values along the way. If that does not deserve a moment of recognition, nothing does.
But you have done something more important than learning R commands and statistical tests. You have learned to move through the entire statistical pipeline — from a question, through exploration, to a defensible conclusion — with confidence. That is a skill you will use long after you close this book.
This epilogue is not about new R code or another statistical test. It is about the final step of any statistical project: writing it up. Because an analysis that stays on your computer is invisible. An analysis you can explain, in plain language, to anyone — a classmate, a teacher, a parent — is where the real value lives.
12.1 From Question to Write-Up: The Full Journey
Let us retrace the path you have walked. It started with a simple question:
Is there a relationship between depression and smoking behavior among young adults?
That question drove everything. Every chapter brought you one step closer to an answer:
| Chapter | What You Did | What It Meant for Your Project |
|---|---|---|
| 1 & 3 | Framed your question and built the nesarc dataset |
You defined what you were studying and who was in your sample |
| 2 & 4 | Described your variables one at a time, then two at a time | You learned what your data looked like — its shape, center, spread, patterns |
| 5 | Learned the logic of hypothesis testing | You moved from describing patterns to deciding whether they were real |
| 6–8 | Ran ANOVA, Chi-Square, and correlation tests | You tested specific relationships and quantified their strength |
| 9–10 | Built regression models | You moved from “are these related?” to “how much does one predict another?” |
| 11 | Explored moderation and logistic regression | You asked richer questions — “does the relationship change depending on context?” |
Each chapter was a brick. Together, they form a complete analysis. If you were working with your own project question — maybe about sleep and academic performance, or reaction times and handedness, or GDP and life expectancy — the same bricks apply. The dataset changes, the variable names change, but the reasoning stays exactly the same.
12.2 Structuring Your Write-Up
A written analysis has a job: to take someone who has never seen your data and walk them from your question to your conclusion in a clear, logical path. Every sentence should serve that goal.
Here is a structure that works. It mirrors the three-stage pipeline you learned in the Preface — collect, summarize, interpret — and the chapter-by-chapter progression you just completed.
12.2.1 1. Introduction and Research Question
Start by telling your reader what you are investigating and why it matters. No data yet — just the question and the motivation.
What to include:
- Your research question, stated clearly and directly
- Why the question is worth asking — what would the answer tell us?
- Your hypothesis: what relationship do you expect, and why?
- A brief note about your dataset — where it came from and who is in it
Example opening paragraph:
This analysis investigates whether young adult daily smokers with a lifetime diagnosis of major depression smoke more cigarettes per day than those without depression. Understanding this relationship matters because depression and smoking are both major public health concerns — if they are connected, treatment approaches might need to address both at once. I hypothesize that smokers with depression will report higher daily cigarette consumption, based on previous research suggesting nicotine is used as a form of self-medication. Data come from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC), a U.S. government survey of over 43,000 adults.
Notice what this paragraph does: it states the question, explains why it matters, makes a prediction, and tells the reader where the data came from. Any reader can now follow the rest of the analysis.
12.2.2 2. Data Description
Before you analyze anything, your reader needs to know what you are working with.
What to include:
- Sample size and key demographics (age range, gender split, ethnicity breakdown if relevant)
- How the sample was constructed (inclusion and exclusion criteria)
- Variable definitions — what each variable measures and what type it is (C or Q)
- A note on missing data and how you handled it
Use a table and a sentence. Something like:
The analysis sample included 1,320 young adults (ages 18–25) who were daily smokers with valid nicotine dependence data. The sample was 55% female and 45% male. About 16% had a lifetime diagnosis of major depression. The primary outcome variable — daily cigarettes smoked — had a mean of 14.2 (SD = 8.4), with 3.8% of values missing due to “Unknown” responses in the original survey.
Short, factual, no interpretation yet. Just the facts your reader needs to trust your analysis.
12.2.3 3. Exploratory Data Analysis
Now show your reader what the data looks like. This is where your graphs and summary statistics come in.
What to include:
- Univariate summaries: describe each key variable individually (center, spread, shape)
- Bivariate visualizations: show the relationship you are investigating
- A table or graph for each important finding — but not every single thing you tried
- Interpretation: do not just present a graph; tell the reader what to notice
A good EDA sentence does two things at once: it states the finding and connects it to your research question.
Figure 1 shows the distribution of daily cigarette consumption among young adult smokers. The distribution is right-skewed, with most smokers reporting 10–20 cigarettes per day and a smaller group reporting 30 or more. When split by depression status (Figure 2), smokers with depression show a slightly higher mean (14.8 vs. 14.1 cigarettes per day), but the distributions overlap considerably.
You are not drawing conclusions yet — you are guiding the reader’s eye toward the pattern you will test formally in the next section.
12.2.4 4. Statistical Analysis
This is the core of your write-up: the formal tests and models that address your research question.
What to include:
- The test(s) you chose and why (remember the variable-type framework: C→Q means t-test or ANOVA, etc.)
- The null and alternative hypotheses, stated in plain language
- The test statistic, degrees of freedom, and p-value
- Your decision: reject or fail to reject the null hypothesis
- A sentence interpreting the result in the context of your question
Example:
An independent samples t-test was conducted to compare daily cigarettes smoked between smokers with and without depression. The null hypothesis stated that there is no difference in mean cigarette consumption between the two groups (\(H_0: \mu_{\text{depressed}} = \mu_{\text{non-depressed}}\)). The test yielded \(t(1315) = 0.94\), \(p = 0.35\). Since \(p > 0.05\), we fail to reject the null hypothesis. The observed difference of approximately 0.7 cigarettes per day is small enough to be explained by sampling variation alone. There is not sufficient evidence to conclude that depression is associated with higher smoking quantity among young adult daily smokers.
Notice the flow: test chosen → hypotheses → numbers → decision → interpretation. Every inferential result you present should follow this pattern.
For modeling chapters (9–11), the structure is similar but includes effect sizes and model fit:
- Regression coefficient, standard error, t-value, p-value
- \(R^2\) or pseudo-\(R^2\) — how much of the variability does the model explain?
- A plain-language interpretation of the slope: “For each additional cigarette smoked per day, the predicted odds of nicotine dependence increase by a factor of…”
12.2.5 5. Discussion
The discussion is where you step back and think about what your results actually mean. It answers three questions:
- What did you find? Summarize your key result in one or two sentences — not the test statistics, but the bottom line.
- What does it mean? Connect your result back to your original question. If you found an association, what does that suggest? If you did not, why might that be?
- What are the limitations? Every study has them. Acknowledge yours honestly — it shows you understand the limits of your analysis.
A good discussion is honest and measured. Do not overclaim. “The data suggest…” is better than “This proves…” If your p-value was 0.049, do not call it “highly significant” — it is barely past the threshold. If your sample was limited to young adults, say so and note that the findings may not apply to older populations.
Common limitations to consider:
- Sample size (smaller samples give less precise estimates)
- Generalizability (your sample may not represent the broader population well)
- Missing data (if many values were missing, your results might be biased)
- Confounding variables (other factors you did not measure might explain the relationship)
- Observational design (you can detect associations, not causation)
12.2.6 6. Conclusion
Your conclusion is short — a paragraph or two. It brings the reader back to where you started:
This analysis explored whether depression is associated with smoking quantity among young adult daily smokers, using data from the NESARC survey. The results suggest no meaningful difference in cigarette consumption between smokers with and without depression. While this finding does not support the self-medication hypothesis for smoking quantity in this age group, it raises new questions: might depression be related to smoking frequency or dependence rather than quantity? Future research with larger samples or longitudinal designs could help clarify these relationships.
Notice how this conclusion: - Restates the question and the answer - Acknowledges what was not found - Suggests what might be explored next - Does not introduce new numbers or tests
12.3 What Makes a Write-Up Strong
Five principles to keep in mind as you write:
Lead with the question, not the method. Your reader cares about what you discovered, not which R function you used. Name the test (t-test, ANOVA, regression) but keep the focus on the finding.
One idea per paragraph. A paragraph that mixes data description, hypothesis testing, and interpretation is hard to follow. Structure your paragraphs so each one does exactly one thing.
Interpret in words, not just numbers. A reader who does not know statistics should still understand your conclusions. After reporting \(p = 0.03\), add: “This means that if there were truly no difference between the groups, we would see a difference at least this large only 3% of the time — which is rare enough to suggest a real effect.”
Use tables and figures purposefully. Every table and figure in your write-up should earn its place. If a graph does not advance your argument, cut it. Label everything clearly — axis labels, figure captions, table titles — so a reader can understand the visual without hunting through the text.
Be honest about uncertainty. Statistics is the science of uncertainty. Your p-values, confidence intervals, and model fit statistics all acknowledge that conclusions drawn from samples are never certain. Embrace that. A write-up that says “the data are consistent with no effect” is more trustworthy than one that claims certainty where none exists.
12.4 Where to Go From Here
You now have the foundation to conduct and write up a complete statistical analysis. The appendices at the end of this book cover research methods in more depth if you want to take your projects further:
- Appendix A — Literature Review: How to find, read, and synthesize existing research on your topic. A strong literature review strengthens your introduction and helps you form better hypotheses.
- Appendix B — Writing Empirical Research: A deeper guide to the structure and style of empirical research papers, including how to cite sources and build an argument.
- Appendix C — Sampling & Study Design: How researchers choose samples and design studies to answer specific questions. Essential reading if you plan to collect your own data.
- Appendix D — Causation: The difference between association and causation, and what it takes to make causal claims.
- Appendix E — Poster Presentation: How to present your analysis visually — for a science fair, a class presentation, or any audience beyond your teacher.
Beyond this book, the R community is vast and welcoming. Thousands of free packages, tutorials, and forums are waiting when you are ready to go further. The skills you have built — reading data, managing it, visualizing it, testing it, modeling it — transfer directly to real-world work in science, business, journalism, policy, and beyond.
12.5 A Final Word
When you started this book, statistics might have seemed like a collection of intimidating formulas and cryptic software. By now, I hope you see it differently: statistics is a way of thinking. It is the habit of asking “how do we know?” instead of just accepting what we are told. It is the discipline of measuring uncertainty instead of pretending it does not exist.
You learned to do this with R, but the thinking travels with you. The next time you see a news headline that says “study finds X causes Y,” you will instinctively ask: How big was the sample? Was it random? What was the p-value? Could there be confounding variables? That critical lens — that refusal to take data at face value — is the real gift of this book.
Take your project, write it up, share it. Your question matters, your analysis matters, and your voice — explaining what you found in clear, honest language — matters most of all.