Causal Inference cheatsheet

Causal Inference cheatsheet based on Matheus Facure's book on Causal Inference. Created by ChatGPT.

Overview of Causal Inference Concepts

ConceptDescriptionExampleReference
Causal InferenceDetermining the cause-and-effect relationship between variables.Assessing the impact of a new drug on patient recovery.Link
Treatment/InterventionThe variable or action being studied for its effect on an outcome.A new teaching method.Link
OutcomeThe variable or result that is influenced by the treatment/intervention.Student test scores.Link
ConfounderA variable that influences both the treatment and the outcome, potentially biasing the estimated effect.Age in a study linking exercise to heart health.Link
Randomized Controlled Trial (RCT)Participants are randomly assigned to treatment or control groups to ensure comparability.Testing a new medication by randomly assigning patients to receive either the medication or a placebo.Link
Observational StudyThe researcher observes the effect of treatments without random assignment.Studying the effect of smoking on lung cancer through observational data.Link
CounterfactualThe hypothetical scenario of what would have happened to the same units under a different treatment condition.What would be the unemployment rate if a stimulus package had not been implemented?Link
Selection BiasBias introduced when the subjects studied are not representative of the general population.Studying only healthy volunteers for a new drug might overestimate its effectiveness.Link
Instrumental Variables (IV)Variables that affect the treatment but do not directly affect the outcome, used to estimate causal relationships when controlled experiments are not feasible.Using distance to the nearest college as an instrument for education level in earnings studies.Link
Difference-in-Differences (DiD)Compares the changes in outcomes over time between a treatment group and a control group.Evaluating the impact of a new law by comparing regions before and after the law is implemented.Link
Regression Discontinuity (RD)Uses a cutoff or threshold to assign treatment and compares those just above and below the cutoff to estimate causal effects.Estimating the effect of a scholarship program on student performance by comparing students around the eligibility cutoff.Link
Propensity Score MatchingMatches treated and untreated units with similar propensity scores (the probability of receiving the treatment) to estimate the treatment effect.Comparing outcomes of patients receiving different treatments by matching on demographic and clinical characteristics.Link
Synthetic Control MethodConstructs a weighted combination of control units to create a synthetic control group for comparison with the treated unit.Evaluating the impact of a policy change in one country by comparing it to a synthetic control group constructed from other countries.Link
Mediation AnalysisExamines how an intermediate variable mediates the relationship between an independent variable and a dependent variable.Analyzing how stress reduction mediates the relationship between exercise and mental health.Link
Natural ExperimentUses naturally occurring events or circumstances that mimic random assignment to estimate causal effects.Studying the impact of a natural disaster on economic outcomes.Link
Heterogeneous Treatment EffectsAnalysis that examines how treatment effects vary across different subgroups or contexts.Investigating whether a job training program has different effects based on participants' age or education level.Link
Panel Data and Fixed EffectsUses data collected over time on the same units to control for unobserved variables that do not change over time.Evaluating the impact of education policies by analyzing student performance data over multiple years.Link
Synthetic Difference-in-Differences (SDID)Combines synthetic control and difference-in-differences methods to estimate treatment effects.Evaluating the impact of a new law by comparing the treated region to a synthetic control region over time.Link

Key Assumptions

AssumptionDescriptionExampleReference
Ignorability/ExchangeabilityGiven a set of observed covariates, the potential outcomes are independent of the treatment assignment.Assuming no unmeasured confounders in a study linking diet to heart disease.Link
Stable Unit Treatment Value Assumption (SUTVA)There are no interference effects between units, and each unit has a single version of treatment.One person's vaccination does not directly affect another's health outcome in the study.Link
Common Support/OverlapThere is a sufficient overlap in covariate distributions between the treatment and control groups to make comparisons possible.In a study comparing different teaching methods, students in all groups have similar background characteristics.Link

Important Methods

Randomized Controlled Trials (RCTs)

  • Purpose: Establish causal relationships by randomly assigning treatment.
  • Example: Testing the effectiveness of a new drug.
  • Key Point: Randomization ensures comparability between treatment and control groups.

Instrumental Variables (IV)

  • Purpose: Estimate causal relationships when controlled experiments are not feasible.
  • Example: Using proximity to a college as an instrument for education in earnings studies.
  • Key Point: The instrument affects the outcome only through the treatment.

Difference-in-Differences (DiD)

  • Purpose: Compare changes in outcomes over time between a treatment group and a control group.
  • Example: Evaluating the impact of a policy change by comparing regions before and after the policy implementation.
  • Key Point: Assumes parallel trends between the treatment and control groups before the intervention.

Regression Discontinuity (RD)

  • Purpose: Estimate causal effects using a cutoff or threshold for treatment assignment.
  • Example: Assessing the effect of a scholarship program by comparing students just above and below the eligibility cutoff.
  • Key Point: Compares observations just above and below the threshold.

Propensity Score Matching

  • Purpose: Estimate treatment effects by matching treated and untreated units with similar propensity scores.
  • Example: Comparing outcomes of patients receiving different treatments by matching on demographic and clinical characteristics.
  • Key Point: Reduces bias by ensuring comparable groups.

Synthetic Control Method

  • Purpose: Create a synthetic control group for comparison with the treated unit.
  • Example: Evaluating the impact of a policy change by comparing it to a synthetic control group constructed from other regions or countries.
  • Key Point: Constructs a weighted combination of control units to match the treated unit.

Mediation Analysis

  • Purpose: Examine how an intermediate variable mediates the relationship between an independent variable and a dependent variable.
  • Example: Analyzing how stress reduction mediates the relationship between exercise and mental health.
  • Key Point: Identifies pathways through which the treatment affects the outcome.

Panel Data and Fixed Effects

  • Purpose: Control for unobserved variables that do not change over time by using data collected over multiple time periods.
  • Example: Evaluating the impact of education policies by analyzing student performance data over multiple years.
  • Key Point: Removes bias from time-invariant unobserved variables.

Synthetic Difference-in-Differences (SDID)

  • Purpose: Combine synthetic control and difference-in-differences methods to estimate treatment effects.
  • Example: Evaluating the impact of a new law by comparing the treated region to a synthetic control region over time.
  • Key Point: Integrates the strengths of both methods for more robust causal inference.

Practical Implementation Tips

  • Data Quality: Ensure high-quality, accurate, and relevant data.
  • Model Validation: Validate models using out-of-sample tests and robustness checks.
  • Assumption Testing: Test key assumptions such as common support and no interference between units.
  • Sensitivity Analysis: Conduct sensitivity analyses to check the robustness of the results to different assumptions and specifications.