Chapter 14 Writing Empirical Research

The goal of research is to share what you find. You can run the most sophisticated analysis in the world, but if no one can understand what you did or why it matters, your work stops at your own desk. Writing is how your data becomes knowledge that other people can use. This appendix walks you through how to structure and write an empirical research paper — the kind of paper you would write to report the analysis you have been building throughout this book.

14.1 The Big Picture: The Hourglass Shape

An empirical research paper has six sections:

Title and Abstract
Introduction
Methods
Results
Discussion
References

These sections follow what writers call an hourglass shape. The top of the hourglass is wide — you start with broad statements that anyone can understand, introducing your topic and why it matters. Then you narrow down to the specific: exactly what you measured, exactly what you found. Finally, you broaden back out — what do your results mean for the bigger picture?

Your introduction is the wide top. Your methods and results are the narrow middle. Your discussion is the wide bottom. Every section has a job to do, and knowing those jobs makes writing much easier than staring at a blank page.

14.2 The Title

A title should summarize your research question in one sentence. After reading your title, someone should know your main variables and what relationship you investigated — even if they never read the rest of the paper.

A good title names the variables and hints at the relationship:

The Association Between Nicotine Dependence and Major Depression at Different Levels of Smoking Exposure

This title tells you the two main variables (nicotine dependence, major depression), the direction of the relationship (association), and the twist (different levels of smoking exposure). It is specific without being wordy.

What to avoid:

Words that serve no purpose: “A Study of…”, “An Investigation of…”, “Research on…” — these phrases add length without meaning. The reader already knows it’s a study.
Causal language: “The Effect of…”, “The Impact of…” — unless you ran a true experiment, you cannot claim causation. We discussed this in Chapter 10.
Abbreviations: Spell everything out. NESARC, not N.E.S.A.R.C., but only after defining it on first use.

Your turn: Try writing your title as “The Association Between [Variable 1] and [Variable 2]” and then refine from there. Replace “Association” with whatever statistical relationship you tested — correlation, difference, prediction — and specify any subgroups or conditions.

14.3 The Introduction

The introduction has one job: answer the question “Why should I care about this research?” It does this in three parts.

14.3.1 Opening Statements

The first few sentences of your introduction should be accessible to anyone — your classmates, your parents, someone who has never taken a statistics course. Start with something concrete and relatable.

For example, if your research is about smoking and mental health, you might begin: “Smoking remains one of the leading preventable causes of death worldwide. Yet despite decades of public health campaigns, millions of young adults continue to smoke. Understanding what makes some people more vulnerable to nicotine dependence than others is a pressing public health question.”

Notice: no numbers yet, no technical terms, no jargon. Just a clear statement of why the topic matters. Define any crucial terms the first time you use them, even if the definition seems obvious to you.

14.3.2 The Literature Review

The literature review is not a long list of everything ever written about your topic. It is an argument. You are building a case that your research question fills a gap in what we currently know.

Each paragraph in your literature review should:

Summarize what previous research has found about your topic
Identify a pattern, trend, or inconsistency in those findings
Point toward the gap that your own research will fill

Here is an example of what a literature review paragraph looks like:

Through the mid-1990s, most research suggested that academic censorship reduced college students’ respect for authority. However, results were inconsistent. In a landmark two-year case study, Jones (1996) found that college students’ respect for authority declined significantly after censorship was imposed. Jones relied exclusively on objective measures rather than self-reported measures of respect for authority.

The first sentence identifies a trend. The second notes inconsistency. The third and fourth present a key finding and explain why the methodology was strong. This paragraph does not just list studies — it builds toward a conclusion.

A critical rule: The main evidence in an empirical paper is data. Opinions, paraphrased statements, or “common knowledge” are not evidence unless they are backed by empirical findings. When you cite a source, you are citing its data and its analysis, not its author’s opinion.

Finding sources: Use primary sources — original journal articles where researchers report their own data and analysis. Avoid relying on textbooks, encyclopedias, or news articles that summarize other people’s work. When you find one good primary source, check its reference list to discover more. Look for articles that have been cited by many other researchers — that is a signal that the work is considered important in the field.

Remember: the literature review is an argument that sets the stage for your research question. It is not an exhaustive review of every detail. Assume your reader is basically knowledgeable about your topic. You do not need to explain that smoking is bad for health — your reader knows that. You need to explain what researchers have found about the specific relationship you are studying, and what they have not yet answered.

14.3.3 Research Questions

Your introduction should build toward and end with your research questions. These are the specific questions your analysis will answer. They should follow naturally from the gap you identified in your literature review.

For a project using the NESARC dataset, your research questions might look like this:

The present study will examine young adults from the National Epidemiologic Survey of Alcohol and Related Conditions (NESARC). The goals of the analysis are: (1) establishing the relationship between major depression and nicotine dependence; and (2) determining whether the relationship between major depression and nicotine dependence exists above and beyond smoking quantity.

Notice how these questions are specific and answerable. They name the dataset, the variables, and the relationships being tested. A reader who stops at the end of your introduction should know exactly what you are going to do and why.

14.4 The Methods Section

The methods section answers the question “What did you actually do?” A reader should be able to replicate your study from this section alone. It has three parts.

14.4.1 Sample

Describe who or what you studied. Be specific.

Who were the participants? How many? Where were they from?
What was the level of analysis — individuals, groups, schools?
What are the key demographic characteristics?

Use meaningful group names (“Young adult daily smokers”) rather than abbreviations (“Group A”) or numerical codes. Your reader should be able to picture your sample.

Here is an example:

The sample of 1,203 pregnant women was drawn from two public prenatal clinics in Texas and Maryland. The ethnic composition was African American (n = 414, 34.4%), Hispanic, primarily Mexican American (n = 412, 34.2%), and White (n = 377, 31.3%). Most women were between the ages of 20 and 29 years; 30% were teenagers. All were urban residents, and most (94%) had incomes below the poverty level.

This description works because it answers the fundamental questions: who, how many, from where, and what characteristics define them.

For a NESARC-based project, your sample description might read:

The sample from the first wave of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) represents the civilian, non-institutionalized adult population of the United States, including persons living in households, military personnel living off base, and persons residing in group quarters. The NESARC included oversampling of Black, Hispanic, and young adult (aged 18 to 24) populations. The full sample included 43,093 participants. For the present analysis, the sample was restricted to young adults aged 18 to 25 who reported daily smoking in the past year, resulting in a final analytic sample of approximately 1,320 individuals.

14.4.2 Procedures

Explain how the data were collected. What method was used — survey, interview, direct measurement, observation? Where and when were data collected? Were any observations discarded, and if so, why?

Here is an example:

Random sampling was used to recruit participants for this study. Surveyors went to considerable lengths to secure a high completion rate, including up to four call-backs, letters, and in some cases monetary incentives. Trained research assistants conducted face-to-face interviews with all study participants.

For NESARC:

One adult was selected for interview in each household. Face-to-face computer-assisted interviews were conducted in respondents’ homes following informed consent procedures.

If any observations were removed during data collection (for example, incomplete surveys), mention that here. If you removed observations during analysis (for example, outliers or missing data), mention that in the Results section instead.

14.4.3 Measures

Describe the specific questions or measurements used to capture each variable. For each variable, explain:

How was it measured? (Survey question, physical measurement, test score, etc.)
What type of variable is it? (Quantitative or categorical.)
If categorical, what do the values mean?
Where did the measure come from? (A validated questionnaire, a national survey, etc.)

For NESARC:

Lifetime major depression was assessed using the NIAAA Alcohol Use Disorder and Associated Disabilities Interview Schedule — DSM-IV (AUDADIS-IV). The tobacco module of the AUDADIS-IV contains detailed questions on the frequency, quantity, and patterning of tobacco use as well as symptom criteria for DSM-IV nicotine dependence. Current smoking was evaluated through both smoking frequency (“About how often did you usually smoke in the past year?”) coded dichotomously in terms of the presence or absence of daily smoking, and quantity (“On the days that you smoked in the last year, about how many cigarettes did you usually smoke?”).

Notice how specific this is. A reader knows exactly which instrument was used, what it measures, and how the answers were coded. That is the goal.

14.5 The Results Section

The results section answers “What did you find?” This is where your statistical analysis goes. You have already learned how to produce these results — every technique in this book, from frequency tables in Chapter 2 to multiple regression in Chapter 10, is a tool for your results section.

When writing your results:

Report numbers, not just words. Instead of “there was a significant difference,” write “F(1, 1318) = 12.34, p = .0004.” Your reader should be able to evaluate the evidence themselves.
Use tables and figures. A well-constructed table (using the techniques from Chapters 2–4 and the modelsummary package) communicates your findings faster than a paragraph of prose.
Describe what the numbers mean in plain language. After you report the test statistic and p-value, translate: “In other words, young adults with major depression smoked significantly more cigarettes per day (M = 14.2, SD = 8.3) than those without major depression (M = 11.6, SD = 7.9).”
Report effect sizes, not just p-values. A p-value tells you whether a result is likely due to chance. An effect size tells you how big the result is. Both matter. Report Cohen’s d, R², or η² alongside your p-values.
Discuss which observations were excluded during analysis. If you removed outliers or recoded missing values, explain what you did and why.

Do not interpret your results in this section. Save the “so what” for the Discussion. The results section is for facts — the Discussion is for meaning.

14.6 The Discussion

The discussion answers “What does it mean?” This is where you connect your findings back to the big picture you set up in your introduction.

A strong discussion does the following:

Restate your main findings in plain language. Start with your most important result, not with background.
Connect your findings to the literature. Do your results support or contradict what previous researchers found? If they contradict, why might that be — different sample, different measures, different time period?
Acknowledge limitations honestly. Every study has limitations. Maybe your sample was only young adults, so you cannot generalize to older populations. Maybe your measure of depression only captures lifetime diagnosis, not current symptoms. Maybe your data are cross-sectional, so you cannot determine the direction of the relationship. Acknowledging limitations is not weakness — it is intellectual honesty. It shows you understand your research deeply.
Discuss implications. If your findings are correct, what should change? Should public health programs target a specific group? Should future research investigate a particular question? Connect your results to real-world decisions.
Suggest future directions. What should the next study do? End with the question your research opens up, not the one it closes.

14.7 The Abstract

Write your abstract last, after the rest of the paper is complete. An abstract is a miniature version of your paper — typically 150–250 words — that covers the key points from every section: your research question, your methods, your main results, and your conclusion. A reader should be able to read only your abstract and understand what you did and what you found.

14.8 References

Every citation in your paper must appear in the reference list, and every reference in your list must be cited in the paper. These two must match perfectly.

Use a reference manager like Zotero (free, at zotero.org) to organize your sources and generate properly formatted citations. Zotero can insert in-text citations and build your reference list automatically in whatever style you need. For empirical research, APA style (American Psychological Association) is the most common.

An APA-style reference entry looks like this:

Breslau, N., Peterson, E. L., Schultz, L. R., Chilcoat, H. D., & Andreski, P. (1998). Major depression and stages of smoking: A longitudinal investigation. Archives of General Psychiatry, 55(2), 161–166.

The basic pattern is: Author last name, initials. (Year). Title of article. Journal Name, Volume(Issue), Page numbers.

Learning to format references by hand is a good exercise, but in practice, use software. Your time is better spent analyzing data and writing clearly than formatting commas and italics.

14.9 Writing Conventions

Good empirical writing is not about sounding smart. It is about being understood. Here are the most important conventions.

14.9.1 Lead Your Reader

Do not make your reader guess where you are going. At the end of each section, hint at what comes next. In your introduction, tell the reader: “The present study will examine…” In your methods, describe your measures in the order you will present results. In your discussion, start with the most important finding. A well-organized paper is a guided tour, not a treasure hunt.

14.9.2 Avoid Direct Quotations

In empirical writing, you almost never quote another author word-for-word. Instead, summarize their findings in your own words and cite them. For example:

Breslau and colleagues (1998) found that major depression increased the risk of later smoking initiation in a longitudinal sample of young adults.

This is better than copying their exact sentence and putting it in quotation marks. Summarizing shows that you understand the work, not just that you can copy and paste.

14.9.3 Avoid Language Bias

Refer to people the way they refer to themselves. Use “participants” rather than “subjects” — the people in your study are human beings who agreed to participate, not objects being experimented on. If your study includes specific ethnic or cultural groups, use the terminology those groups prefer. When in doubt, check the American Psychological Association’s guidelines on bias-free language.

14.9.4 Be Succinct

After you finish your first draft, go back and remove every word that does not earn its place. Look for:

“It is important to note that…” → Delete. If it is important, the reader will know.
“Interestingly…” → Delete. Let the reader decide what is interesting.
“In order to…” → Replace with “To…”
“Due to the fact that…” → Replace with “Because…”

Every sentence should say something new. If two sentences make the same point, keep the stronger one and delete the other.

14.9.5 Avoid Jargon

Use technical terms only when they communicate more precisely than plain English. When you must use a technical term, define it the first time — either explicitly (“nicotine dependence, defined as…”) or by example. Imagine you are explaining your research to a classmate who is taking a different science course. They are smart, but they do not know your specific vocabulary.

14.9.6 Voice

In most academic writing, avoid “I” and “we.” Instead of “I ran a t-test,” write “A t-test was conducted.” Instead of “We found that…,” write “The results showed that…” This keeps the focus on the research, not on the researcher. There are exceptions — some journals and fields prefer first-person — but when you are learning, the impersonal style is safer and more widely accepted.

14.10 From This Book to Your Paper

You have been building a research project as you worked through this book. By now, you have:

A research question you chose (Chapter 1)
A dataset you understand (Chapter 1–3)
Descriptive statistics that characterize your sample (Chapter 2–4)
A hypothesis you tested (Chapter 5)
Statistical results from at least one analytical technique (Chapters 6–11)

You already have the raw materials for every section of an empirical paper. The challenge now is to assemble them into a coherent narrative — one that leads a reader from “why should I care?” through “what did you do?” and “what did you find?” to “what does it mean?”

The best way to learn empirical writing is to read empirical papers. Every primary source article you find for your literature review is a model of the kind of writing you are trying to produce. Pay attention to how those authors structure their introductions, how they describe their methods, how they present their results. Imitate what works.

Writing is revision. Your first draft is for getting ideas onto the page. Your second draft is for organizing those ideas. Your third draft is for making every sentence clear. Give yourself time between drafts — stepping away for a day or two lets you see your writing with fresh eyes. And always, always have someone else read your work before you consider it finished. What seems clear to you may be confusing to a reader who does not live inside your head.