Examples gallery

A worked basic (minimal call) and advanced (full options) example for every analytical function in expdpy, each run live on the bundled kuznets country–year panel so you can see the actual output. This mirrors the Function reference one entry at a time; for a narrative tour of a typical workflow, see the Quickstart instead.

Every example assumes the following setup:

import numpy as np
import expdpy as ex
from expdpy.data import load_kuznets

df = load_kuznets()

Tables

`prepare_descriptive_table`

N, mean, standard deviation and quantiles for the numeric variables.

Basic:

ex.prepare_descriptive_table(df).gt

	N	Mean	Std. dev.	Min.	25 %	Median	75 %	Max.
Descriptive Statistics
year	880	2,020.000	3.164	2,015.000	2,017.000	2,020.000	2,023.000	2,025.000
gini_regional	880	0.273	0.091	0.020	0.207	0.270	0.335	0.568
gdp_pc	880	25,523.391	34,213.190	531.027	2,512.157	11,199.795	33,444.169	150,000.000
population	880	19,699,168.035	47,939,729.166	209,936.000	1,759,924.250	4,936,045.500	19,361,656.000	368,704,165.000
resource_rents	880	14.962	8.634	0.341	9.022	13.075	18.813	44.025
arable_land	880	0.245	0.121	0.017	0.165	0.219	0.341	0.571
trade_share	880	0.611	0.236	0.198	0.443	0.555	0.725	1.421
fdi_share	834	0.023	0.057	−0.151	−0.019	0.021	0.062	0.184
area	880	450,980.746	1,018,019.939	1,224.257	30,587.671	125,400.969	494,298.458	8,189,082.989
gasoline_price	648	0.554	0.218	0.200	0.378	0.562	0.725	1.055
aid	815	351,203,123.507	505,348,429.052	−8,053,828.003	56,235,278.824	131,148,458.214	435,529,190.612	2,200,000,000.000
school_enrollment	735	55.507	26.120	6.000	34.218	56.896	75.026	125.014
gini_lights	747	0.283	0.132	0.010	0.194	0.282	0.375	0.668
polity2	788	0.547	4.766	−10.000	−3.000	1.000	4.000	10.000
federal	880	0.200	0.400	0.000	0.000	0.000	0.000	1.000
log_gdp_pc	880	9.205	1.490	6.275	7.829	9.324	10.418	11.918
log_gdp_pc_sq	880	86.958	27.476	39.373	61.292	86.930	108.527	142.048
log_gdp_pc_cu	880	841.264	388.431	247.060	479.845	810.509	1,130.595	1,692.984

Advanced — set the decimals per statistic (None drops that column) and a caption:

ex.prepare_descriptive_table(
    df,
    digits=(0, 2, 2, None, None, 2, None, None),
    caption="Kuznets panel",
).gt

	N	Mean	Std. dev.	Median
Kuznets panel
year	880	2,020.00	3.16	2,020.00
gini_regional	880	0.27	0.09	0.27
gdp_pc	880	25,523.39	34,213.19	11,199.80
population	880	19,699,168.04	47,939,729.17	4,936,045.50
resource_rents	880	14.96	8.63	13.08
arable_land	880	0.25	0.12	0.22
trade_share	880	0.61	0.24	0.55
fdi_share	834	0.02	0.06	0.02
area	880	450,980.75	1,018,019.94	125,400.97
gasoline_price	648	0.55	0.22	0.56
aid	815	351,203,123.51	505,348,429.05	131,148,458.21
school_enrollment	735	55.51	26.12	56.90
gini_lights	747	0.28	0.13	0.28
polity2	788	0.55	4.77	1.00
federal	880	0.20	0.40	0.00
log_gdp_pc	880	9.21	1.49	9.32
log_gdp_pc_sq	880	86.96	27.48	86.93
log_gdp_pc_cu	880	841.26	388.43	810.51

`prepare_correlation_table`

Pearson correlations above the diagonal, Spearman below; significant cells in bold.

Basic:

ex.prepare_correlation_table(df[["gini_regional", "gdp_pc", "log_gdp_pc"]]).gt

	A	B	C
A: gini_regional		0.20	-0.09
B: gdp_pc	-0.19		0.82
C: log_gdp_pc	-0.19	1.00
This table reports Pearson correlations above and Spearman correlations below the diagonal. Number of observations: 880. Correlations with significance levels below 5% appear in bold.

Advanced — more decimals, a stricter bold threshold, and a caption:

ex.prepare_correlation_table(
    df[["gini_regional", "gdp_pc", "log_gdp_pc", "trade_share"]],
    digits=3,
    bold=0.01,
    caption="Correlations (kuznets)",
).gt

	A	B	C	D
Correlations (kuznets)
A: gini_regional		0.202	-0.088	-0.155
B: gdp_pc	-0.190		0.825	-0.082
C: log_gdp_pc	-0.190	1.000		0.028
D: trade_share	-0.148	0.058	0.058
This table reports Pearson correlations above and Spearman correlations below the diagonal. Number of observations: 880. Correlations with significance levels below 1% appear in bold.

`prepare_ext_obs_table`

The top and bottom n observations of a variable.

Basic:

ex.prepare_ext_obs_table(df, n=5).gt

/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan

country	iso	year	continent	gini_regional	gdp_pc	population	resource_rents	arable_land	trade_share	fdi_share	area	gasoline_price	aid	school_enrollment	gini_lights	polity2	federal	log_gdp_pc	log_gdp_pc_sq	log_gdp_pc_cu
country 80	C80	2,025.000	Continent E	0.494	150,000.000	5,878,456.000	5.328	0.439	0.550	−0.009	262,807.424	0.822	66,326,822.935	109.834	0.270	9.000	0.000	11.918	142.048	1,692.984
country 80	C80	2,024.000	Continent E	0.539	150,000.000	5,733,636.000	4.602	0.444	0.509	−0.031	262,807.424	0.881	59,108,090.042	105.354	0.379	9.000	0.000	11.918	142.048	1,692.984
country 80	C80	2,022.000	Continent E	0.528	150,000.000	5,454,610.000	4.504	0.449	0.516	−0.081	262,807.424	0.927	57,205,742.488	113.707	0.619	10.000	0.000	11.918	142.048	1,692.984
country 80	C80	2,023.000	Continent E	0.568	150,000.000	5,592,383.000	7.703	0.439	0.467	0.009	262,807.424	0.984	61,028,630.174	115.897	0.517	10.000	0.000	11.918	142.048	1,692.984
country 80	C80	2,021.000	Continent E	0.491	150,000.000	5,320,231.000	8.007	0.437	0.490	−0.063	262,807.424	0.821	59,797,131.352	111.021	0.529	10.000	0.000	11.918	142.048	1,692.984
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
country 4	C04	2,022.000	Continent A	0.099	626.936	1,170,686.000	16.226	0.310	0.384	0.042	518,862.110	0.200	209,371,704.365	6.000	...	−10.000	0.000	6.441	41.484	267.195
country 4	C04	2,021.000	Continent A	0.033	611.293	1,153,163.000	13.580	0.312	0.479	0.056	518,862.110	0.302	180,576,084.351	6.000	0.182	−9.000	0.000	6.416	41.160	264.063
country 4	C04	2,023.000	Continent A	0.020	580.034	1,188,475.000	13.138	0.273	0.472	0.047	518,862.110	0.200	204,319,197.473	8.851	0.145	...	0.000	6.363	40.489	257.634
country 4	C04	2,024.000	Continent A	0.020	571.021	1,206,535.000	10.032	0.318	0.400	0.056	518,862.110	0.271	168,084,858.461	6.000	0.287	−7.000	0.000	6.347	40.290	255.737
country 4	C04	2,025.000	Continent A	0.020	531.027	1,224,868.000	10.675	0.305	0.432	0.022	518,862.110	0.200	172,810,451.647	9.809	0.360	−3.000	0.000	6.275	39.373	247.060

Advanced — the ten most extreme observations of a chosen variable, showing only the panel identifiers and that variable:

ex.prepare_ext_obs_table(
    df, n=10, cs_id=["country"], ts_id="year", var="gini_regional"
).gt

/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan

country	year	gini_regional
country 80	2,023.000	0.568
country 80	2,024.000	0.539
country 69	2,025.000	0.532
country 80	2,022.000	0.528
country 79	2,021.000	0.524
country 28	2,018.000	0.516
country 79	2,025.000	0.513
country 69	2,024.000	0.511
country 79	2,023.000	0.510
country 28	2,025.000	0.509
...	...	...
country 1	2,015.000	0.086
country 43	2,021.000	0.084
country 1	2,016.000	0.080
country 4	2,017.000	0.065
country 16	2,025.000	0.057
country 4	2,020.000	0.035
country 4	2,021.000	0.033
country 4	2,024.000	0.020
country 4	2,025.000	0.020
country 4	2,023.000	0.020

Graphs

`prepare_correlation_graph`

Basic — a correlation heatmap:

ex.prepare_correlation_graph(df[["gini_regional", "gdp_pc", "log_gdp_pc"]]).fig

Advanced — the ellipse style (R corrplot look):

ex.prepare_correlation_graph(
    df[["gini_regional", "gdp_pc", "log_gdp_pc", "trade_share"]],
    style="ellipse",
).fig

`prepare_trend_graph`

Basic — the mean of one variable over time, with standard-error bars:

ex.prepare_trend_graph(df, ts_id="year", var=["gini_regional"]).fig

Advanced — several variables on one chart:

ex.prepare_trend_graph(df, ts_id="year", var=["gini_regional", "trade_share"]).fig

`prepare_quantile_trend_graph`

Basic — the default quantiles of a variable over time:

ex.prepare_quantile_trend_graph(df, ts_id="year", var="gini_regional").fig

Advanced — custom quantile levels and no per-observation points:

ex.prepare_quantile_trend_graph(
    df, ts_id="year", var="gini_regional", quantiles=(0.1, 0.5, 0.9), points=False
).fig

`prepare_by_group_bar_graph`

Basic — the group mean of a variable:

ex.prepare_by_group_bar_graph(df, "continent", "gini_regional").fig

Advanced — a different statistic, bars ordered by it, and a custom color:

ex.prepare_by_group_bar_graph(
    df, "continent", "gini_regional",
    stat_fun=np.nanmedian, order_by_stat=True, color="#4682b4",
).fig

`prepare_by_group_trend_graph`

Basic — one line per group over time:

ex.prepare_by_group_trend_graph(
    df, ts_id="year", group_var="continent", var="gini_regional"
).fig

Advanced — add standard-error bars and drop the markers:

ex.prepare_by_group_trend_graph(
    df, ts_id="year", group_var="continent", var="gini_regional",
    error_bars=True, points=False,
).fig

`prepare_by_group_violin_graph`

This function returns a Plotly figure directly (no .fig).

Basic — the distribution of a variable across groups:

ex.prepare_by_group_violin_graph(df, "continent", "gini_regional")

Advanced — order groups by mean and orient the violins vertically:

ex.prepare_by_group_violin_graph(
    df, "continent", "gini_regional", order_by_mean=True, group_on_y=False
)

`prepare_histogram`

Basic — a 30-bin histogram:

ex.prepare_histogram(df, "gini_regional").fig

Advanced — finer bins on another variable:

ex.prepare_histogram(df, "gdp_pc", bins=50).fig

`prepare_bar_chart`

Basic — category counts:

ex.prepare_bar_chart(df, "continent").fig

Advanced — order bars by descending count and set a custom color:

ex.prepare_bar_chart(df, "continent", order_by_count=True, color="red").fig

`prepare_missing_values_graph`

This function returns a Plotly figure directly (no .fig).

Basic — the fraction of missing values by variable and year:

ex.prepare_missing_values_graph(df, ts_id="year")

Advanced — restrict to numeric variables and show only whether values are missing:

ex.prepare_missing_values_graph(df, ts_id="year", no_factors=True, binary=True)

`prepare_scatter_plot`

This function returns a Plotly figure directly (no .fig).

Basic — a plain scatter of two variables:

ex.prepare_scatter_plot(df, x="log_gdp_pc", y="gini_regional")

Advanced — map color and size to other columns and add a size-weighted LOESS smoother (the N-shaped Kuznets curve):

ex.prepare_scatter_plot(
    df, x="log_gdp_pc", y="gini_regional",
    color="continent", size="population", loess=2, alpha=0.6,
)

Regression

`prepare_regression_table`

Basic — a pooled OLS regression of the cubic Kuznets curve:

ex.prepare_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
).etable

	gini_regional
	(1)
coef
log_gdp_pc	6.385*** (0.134)
log_gdp_pc_sq	-0.711*** (0.015)
log_gdp_pc_cu	0.026*** (0.001)
Intercept	-18.490*** (0.402)
stats
Observations	880
R²	0.744
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Advanced — absorb two-way (country + year) fixed effects with standard errors clustered by country:

ex.prepare_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
).etable

	gini_regional
	(1)
coef
log_gdp_pc	6.411*** (0.210)
log_gdp_pc_sq	-0.715*** (0.023)
log_gdp_pc_cu	0.026*** (0.001)
fe
year	x
country	x
stats
Observations	880
R²	0.874
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

`prepare_fwl_plot`

Basic — the partial relationship of the outcome and a single regressor:

ex.prepare_fwl_plot(df, dv="gini_regional", var="log_gdp_pc").fig

Advanced — residualize on the other cubic terms and two-way fixed effects, with the reported standard error clustered by country:

ex.prepare_fwl_plot(
    df,
    dv="gini_regional",
    var="log_gdp_pc",
    controls=["log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
).fig

Data preparation

`treat_outliers`

Basic — winsorize a single variable at the 1st/99th percentile:

ex.treat_outliers(df["gdp_pc"], percentile=0.01).describe()

count       880.000000
mean      25524.001800
std       34212.744869
min         666.720522
25%        2512.157493
50%       11199.795097
75%       33444.168592
max      150000.000000
Name: gdp_pc, dtype: float64

Advanced — winsorize several columns at the 5th/95th percentile, with cut-offs computed within each continent:

ex.treat_outliers(
    df[["gini_regional", "gdp_pc"]], percentile=0.05, by=df["continent"]
).describe()

	gini_regional	gdp_pc
count	880.000000	880.000000
mean	0.273319	24703.010142
std	0.086432	32756.473008
min	0.080619	646.309044
25%	0.206811	2512.157493
50%	0.270045	11199.795097
75%	0.334933	32639.241341
max	0.503312	150000.000000