Causal Inference cheatsheet based on Matheus Facure's book on Causal Inference. Created by ChatGPT.
Overview of Causal Inference Concepts
Concept | Description | Example | Reference |
---|---|---|---|
Causal Inference | Determining the cause-and-effect relationship between variables. | Assessing the impact of a new drug on patient recovery. | Link |
Treatment/Intervention | The variable or action being studied for its effect on an outcome. | A new teaching method. | Link |
Outcome | The variable or result that is influenced by the treatment/intervention. | Student test scores. | Link |
Confounder | A variable that influences both the treatment and the outcome, potentially biasing the estimated effect. | Age in a study linking exercise to heart health. | Link |
Randomized Controlled Trial (RCT) | Participants are randomly assigned to treatment or control groups to ensure comparability. | Testing a new medication by randomly assigning patients to receive either the medication or a placebo. | Link |
Observational Study | The researcher observes the effect of treatments without random assignment. | Studying the effect of smoking on lung cancer through observational data. | Link |
Counterfactual | The hypothetical scenario of what would have happened to the same units under a different treatment condition. | What would be the unemployment rate if a stimulus package had not been implemented? | Link |
Selection Bias | Bias introduced when the subjects studied are not representative of the general population. | Studying only healthy volunteers for a new drug might overestimate its effectiveness. | Link |
Instrumental Variables (IV) | Variables that affect the treatment but do not directly affect the outcome, used to estimate causal relationships when controlled experiments are not feasible. | Using distance to the nearest college as an instrument for education level in earnings studies. | Link |
Difference-in-Differences (DiD) | Compares the changes in outcomes over time between a treatment group and a control group. | Evaluating the impact of a new law by comparing regions before and after the law is implemented. | Link |
Regression Discontinuity (RD) | Uses a cutoff or threshold to assign treatment and compares those just above and below the cutoff to estimate causal effects. | Estimating the effect of a scholarship program on student performance by comparing students around the eligibility cutoff. | Link |
Propensity Score Matching | Matches treated and untreated units with similar propensity scores (the probability of receiving the treatment) to estimate the treatment effect. | Comparing outcomes of patients receiving different treatments by matching on demographic and clinical characteristics. | Link |
Synthetic Control Method | Constructs a weighted combination of control units to create a synthetic control group for comparison with the treated unit. | Evaluating the impact of a policy change in one country by comparing it to a synthetic control group constructed from other countries. | Link |
Mediation Analysis | Examines how an intermediate variable mediates the relationship between an independent variable and a dependent variable. | Analyzing how stress reduction mediates the relationship between exercise and mental health. | Link |
Natural Experiment | Uses naturally occurring events or circumstances that mimic random assignment to estimate causal effects. | Studying the impact of a natural disaster on economic outcomes. | Link |
Heterogeneous Treatment Effects | Analysis that examines how treatment effects vary across different subgroups or contexts. | Investigating whether a job training program has different effects based on participants' age or education level. | Link |
Panel Data and Fixed Effects | Uses data collected over time on the same units to control for unobserved variables that do not change over time. | Evaluating the impact of education policies by analyzing student performance data over multiple years. | Link |
Synthetic Difference-in-Differences (SDID) | Combines synthetic control and difference-in-differences methods to estimate treatment effects. | Evaluating the impact of a new law by comparing the treated region to a synthetic control region over time. | Link |
Key Assumptions
Assumption | Description | Example | Reference |
---|---|---|---|
Ignorability/Exchangeability | Given a set of observed covariates, the potential outcomes are independent of the treatment assignment. | Assuming no unmeasured confounders in a study linking diet to heart disease. | Link |
Stable Unit Treatment Value Assumption (SUTVA) | There are no interference effects between units, and each unit has a single version of treatment. | One person's vaccination does not directly affect another's health outcome in the study. | Link |
Common Support/Overlap | There is a sufficient overlap in covariate distributions between the treatment and control groups to make comparisons possible. | In a study comparing different teaching methods, students in all groups have similar background characteristics. | Link |
Important Methods
Randomized Controlled Trials (RCTs)
- Purpose: Establish causal relationships by randomly assigning treatment.
- Example: Testing the effectiveness of a new drug.
- Key Point: Randomization ensures comparability between treatment and control groups.
Instrumental Variables (IV)
- Purpose: Estimate causal relationships when controlled experiments are not feasible.
- Example: Using proximity to a college as an instrument for education in earnings studies.
- Key Point: The instrument affects the outcome only through the treatment.
Difference-in-Differences (DiD)
- Purpose: Compare changes in outcomes over time between a treatment group and a control group.
- Example: Evaluating the impact of a policy change by comparing regions before and after the policy implementation.
- Key Point: Assumes parallel trends between the treatment and control groups before the intervention.
Regression Discontinuity (RD)
- Purpose: Estimate causal effects using a cutoff or threshold for treatment assignment.
- Example: Assessing the effect of a scholarship program by comparing students just above and below the eligibility cutoff.
- Key Point: Compares observations just above and below the threshold.
Propensity Score Matching
- Purpose: Estimate treatment effects by matching treated and untreated units with similar propensity scores.
- Example: Comparing outcomes of patients receiving different treatments by matching on demographic and clinical characteristics.
- Key Point: Reduces bias by ensuring comparable groups.
Synthetic Control Method
- Purpose: Create a synthetic control group for comparison with the treated unit.
- Example: Evaluating the impact of a policy change by comparing it to a synthetic control group constructed from other regions or countries.
- Key Point: Constructs a weighted combination of control units to match the treated unit.
Mediation Analysis
- Purpose: Examine how an intermediate variable mediates the relationship between an independent variable and a dependent variable.
- Example: Analyzing how stress reduction mediates the relationship between exercise and mental health.
- Key Point: Identifies pathways through which the treatment affects the outcome.
Panel Data and Fixed Effects
- Purpose: Control for unobserved variables that do not change over time by using data collected over multiple time periods.
- Example: Evaluating the impact of education policies by analyzing student performance data over multiple years.
- Key Point: Removes bias from time-invariant unobserved variables.
Synthetic Difference-in-Differences (SDID)
- Purpose: Combine synthetic control and difference-in-differences methods to estimate treatment effects.
- Example: Evaluating the impact of a new law by comparing the treated region to a synthetic control region over time.
- Key Point: Integrates the strengths of both methods for more robust causal inference.
Practical Implementation Tips
- Data Quality: Ensure high-quality, accurate, and relevant data.
- Model Validation: Validate models using out-of-sample tests and robustness checks.
- Assumption Testing: Test key assumptions such as common support and no interference between units.
- Sensitivity Analysis: Conduct sensitivity analyses to check the robustness of the results to different assumptions and specifications.