load_data#

causalpy.data.datasets.load_data(dataset)[source]#

Load example datasets for causal inference analysis.

This function loads pre-packaged datasets that are used in CausalPy’s documentation and examples. These datasets demonstrate various causal inference methods including difference-in-differences, regression discontinuity, synthetic control, interrupted time series, and more.

Parameters:

dataset (str) –

Name of the dataset to load. Available datasets are:

  • "banks" - Historic banking closures data for difference-in-differences

  • "brexit" - UK GDP data for estimating causal impact of Brexit

  • "covid" - Deaths and temperature data for England and Wales

  • "did" - Difference-in-differences example dataset

  • "drinking" - Minimum legal drinking age data for regression discontinuity

  • "its" - Interrupted time series example dataset

  • "its simple" - Simplified interrupted time series dataset

  • "rd" - Regression discontinuity example dataset

  • "sc" - Synthetic control example dataset

  • "anova1" - ANCOVA example with pre/post treatment nonequivalent groups

  • "geolift1" - Single treatment geo-lift dataset for synthetic control

  • "geolift_multi_cell" - Multi-cell geo-lift dataset for synthetic control

  • "risk" - Acemoglu, Johnson & Robinson (2001) data for instrumental variables

  • "nhefs" - National Health and Nutrition Examination Survey data

  • "schoolReturns" - Schooling returns data for instrumental variable analysis

  • "pisa18" - PISA 2018 sample data

  • "nets" - National Supported Work Demonstration dataset

  • "lalonde" - LaLonde dataset for propensity score analysis

Returns:

The requested dataset as a pandas DataFrame.

Return type:

pd.DataFrame

Raises:

ValueError – If the requested dataset name is not found in the available datasets.

Examples

Load the difference-in-differences example dataset:

>>> import causalpy as cp
>>> df = cp.load_data("did")

Load the regression discontinuity dataset:

>>> df = cp.load_data("rd")