Examples gallery

A worked basic (minimal call) and advanced (full options) example for every analytical function in expdpy, each run live on the bundled kuznets country–year panel so you can see the actual output. This mirrors the Function reference one entry at a time; for a narrative tour of a typical workflow, see the Quickstart instead.

Every example assumes the following setup:

import numpy as np
import expdpy as ex
from expdpy.data import load_kuznets

df = load_kuznets()

Tables

prepare_descriptive_table

N, mean, standard deviation and quantiles for the numeric variables.

Basic:

ex.prepare_descriptive_table(df).gt
Descriptive Statistics
N Mean Std. dev. Min. 25 % Median 75 % Max.
year 880 2,020.000 3.164 2,015.000 2,017.000 2,020.000 2,023.000 2,025.000
gini_regional 880 0.273 0.091 0.020 0.207 0.270 0.335 0.568
gdp_pc 880 25,523.391 34,213.190 531.027 2,512.157 11,199.795 33,444.169 150,000.000
population 880 19,699,168.035 47,939,729.166 209,936.000 1,759,924.250 4,936,045.500 19,361,656.000 368,704,165.000
resource_rents 880 14.962 8.634 0.341 9.022 13.075 18.813 44.025
arable_land 880 0.245 0.121 0.017 0.165 0.219 0.341 0.571
trade_share 880 0.611 0.236 0.198 0.443 0.555 0.725 1.421
fdi_share 834 0.023 0.057 −0.151 −0.019 0.021 0.062 0.184
area 880 450,980.746 1,018,019.939 1,224.257 30,587.671 125,400.969 494,298.458 8,189,082.989
gasoline_price 648 0.554 0.218 0.200 0.378 0.562 0.725 1.055
aid 815 351,203,123.507 505,348,429.052 −8,053,828.003 56,235,278.824 131,148,458.214 435,529,190.612 2,200,000,000.000
school_enrollment 735 55.507 26.120 6.000 34.218 56.896 75.026 125.014
gini_lights 747 0.283 0.132 0.010 0.194 0.282 0.375 0.668
polity2 788 0.547 4.766 −10.000 −3.000 1.000 4.000 10.000
federal 880 0.200 0.400 0.000 0.000 0.000 0.000 1.000
log_gdp_pc 880 9.205 1.490 6.275 7.829 9.324 10.418 11.918
log_gdp_pc_sq 880 86.958 27.476 39.373 61.292 86.930 108.527 142.048
log_gdp_pc_cu 880 841.264 388.431 247.060 479.845 810.509 1,130.595 1,692.984

Advanced — set the decimals per statistic (None drops that column) and a caption:

ex.prepare_descriptive_table(
    df,
    digits=(0, 2, 2, None, None, 2, None, None),
    caption="Kuznets panel",
).gt
Kuznets panel
N Mean Std. dev. Median
year 880 2,020.00 3.16 2,020.00
gini_regional 880 0.27 0.09 0.27
gdp_pc 880 25,523.39 34,213.19 11,199.80
population 880 19,699,168.04 47,939,729.17 4,936,045.50
resource_rents 880 14.96 8.63 13.08
arable_land 880 0.25 0.12 0.22
trade_share 880 0.61 0.24 0.55
fdi_share 834 0.02 0.06 0.02
area 880 450,980.75 1,018,019.94 125,400.97
gasoline_price 648 0.55 0.22 0.56
aid 815 351,203,123.51 505,348,429.05 131,148,458.21
school_enrollment 735 55.51 26.12 56.90
gini_lights 747 0.28 0.13 0.28
polity2 788 0.55 4.77 1.00
federal 880 0.20 0.40 0.00
log_gdp_pc 880 9.21 1.49 9.32
log_gdp_pc_sq 880 86.96 27.48 86.93
log_gdp_pc_cu 880 841.26 388.43 810.51

prepare_correlation_table

Pearson correlations above the diagonal, Spearman below; significant cells in bold.

Basic:

ex.prepare_correlation_table(df[["gini_regional", "gdp_pc", "log_gdp_pc"]]).gt
A B C
A: gini_regional 0.20 -0.09
B: gdp_pc -0.19 0.82
C: log_gdp_pc -0.19 1.00
This table reports Pearson correlations above and Spearman correlations below the diagonal. Number of observations: 880. Correlations with significance levels below 5% appear in bold.

Advanced — more decimals, a stricter bold threshold, and a caption:

ex.prepare_correlation_table(
    df[["gini_regional", "gdp_pc", "log_gdp_pc", "trade_share"]],
    digits=3,
    bold=0.01,
    caption="Correlations (kuznets)",
).gt
Correlations (kuznets)
A B C D
A: gini_regional 0.202 -0.088 -0.155
B: gdp_pc -0.190 0.825 -0.082
C: log_gdp_pc -0.190 1.000 0.028
D: trade_share -0.148 0.058 0.058
This table reports Pearson correlations above and Spearman correlations below the diagonal. Number of observations: 880. Correlations with significance levels below 1% appear in bold.

prepare_ext_obs_table

The top and bottom n observations of a variable.

Basic:

ex.prepare_ext_obs_table(df, n=5).gt
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan
country iso year continent gini_regional gdp_pc population resource_rents arable_land trade_share fdi_share area gasoline_price aid school_enrollment gini_lights polity2 federal log_gdp_pc log_gdp_pc_sq log_gdp_pc_cu
country 80 C80 2,025.000 Continent E 0.494 150,000.000 5,878,456.000 5.328 0.439 0.550 −0.009 262,807.424 0.822 66,326,822.935 109.834 0.270 9.000 0.000 11.918 142.048 1,692.984
country 80 C80 2,024.000 Continent E 0.539 150,000.000 5,733,636.000 4.602 0.444 0.509 −0.031 262,807.424 0.881 59,108,090.042 105.354 0.379 9.000 0.000 11.918 142.048 1,692.984
country 80 C80 2,022.000 Continent E 0.528 150,000.000 5,454,610.000 4.504 0.449 0.516 −0.081 262,807.424 0.927 57,205,742.488 113.707 0.619 10.000 0.000 11.918 142.048 1,692.984
country 80 C80 2,023.000 Continent E 0.568 150,000.000 5,592,383.000 7.703 0.439 0.467 0.009 262,807.424 0.984 61,028,630.174 115.897 0.517 10.000 0.000 11.918 142.048 1,692.984
country 80 C80 2,021.000 Continent E 0.491 150,000.000 5,320,231.000 8.007 0.437 0.490 −0.063 262,807.424 0.821 59,797,131.352 111.021 0.529 10.000 0.000 11.918 142.048 1,692.984
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
country 4 C04 2,022.000 Continent A 0.099 626.936 1,170,686.000 16.226 0.310 0.384 0.042 518,862.110 0.200 209,371,704.365 6.000 ... −10.000 0.000 6.441 41.484 267.195
country 4 C04 2,021.000 Continent A 0.033 611.293 1,153,163.000 13.580 0.312 0.479 0.056 518,862.110 0.302 180,576,084.351 6.000 0.182 −9.000 0.000 6.416 41.160 264.063
country 4 C04 2,023.000 Continent A 0.020 580.034 1,188,475.000 13.138 0.273 0.472 0.047 518,862.110 0.200 204,319,197.473 8.851 0.145 ... 0.000 6.363 40.489 257.634
country 4 C04 2,024.000 Continent A 0.020 571.021 1,206,535.000 10.032 0.318 0.400 0.056 518,862.110 0.271 168,084,858.461 6.000 0.287 −7.000 0.000 6.347 40.290 255.737
country 4 C04 2,025.000 Continent A 0.020 531.027 1,224,868.000 10.675 0.305 0.432 0.022 518,862.110 0.200 172,810,451.647 9.809 0.360 −3.000 0.000 6.275 39.373 247.060

Advanced — the ten most extreme observations of a chosen variable, showing only the panel identifiers and that variable:

ex.prepare_ext_obs_table(
    df, n=10, cs_id=["country"], ts_id="year", var="gini_regional"
).gt
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
  separator.loc[:, :] = np.nan
country year gini_regional
country 80 2,023.000 0.568
country 80 2,024.000 0.539
country 69 2,025.000 0.532
country 80 2,022.000 0.528
country 79 2,021.000 0.524
country 28 2,018.000 0.516
country 79 2,025.000 0.513
country 69 2,024.000 0.511
country 79 2,023.000 0.510
country 28 2,025.000 0.509
... ... ...
country 1 2,015.000 0.086
country 43 2,021.000 0.084
country 1 2,016.000 0.080
country 4 2,017.000 0.065
country 16 2,025.000 0.057
country 4 2,020.000 0.035
country 4 2,021.000 0.033
country 4 2,024.000 0.020
country 4 2,025.000 0.020
country 4 2,023.000 0.020

Graphs

prepare_correlation_graph

Basic — a correlation heatmap:

ex.prepare_correlation_graph(df[["gini_regional", "gdp_pc", "log_gdp_pc"]]).fig

Advanced — the ellipse style (R corrplot look):

ex.prepare_correlation_graph(
    df[["gini_regional", "gdp_pc", "log_gdp_pc", "trade_share"]],
    style="ellipse",
).fig

prepare_trend_graph

Basic — the mean of one variable over time, with standard-error bars:

ex.prepare_trend_graph(df, ts_id="year", var=["gini_regional"]).fig

Advanced — several variables on one chart:

ex.prepare_trend_graph(df, ts_id="year", var=["gini_regional", "trade_share"]).fig

prepare_quantile_trend_graph

Basic — the default quantiles of a variable over time:

ex.prepare_quantile_trend_graph(df, ts_id="year", var="gini_regional").fig

Advanced — custom quantile levels and no per-observation points:

ex.prepare_quantile_trend_graph(
    df, ts_id="year", var="gini_regional", quantiles=(0.1, 0.5, 0.9), points=False
).fig

prepare_by_group_bar_graph

Basic — the group mean of a variable:

ex.prepare_by_group_bar_graph(df, "continent", "gini_regional").fig

Advanced — a different statistic, bars ordered by it, and a custom color:

ex.prepare_by_group_bar_graph(
    df, "continent", "gini_regional",
    stat_fun=np.nanmedian, order_by_stat=True, color="#4682b4",
).fig

prepare_by_group_trend_graph

Basic — one line per group over time:

ex.prepare_by_group_trend_graph(
    df, ts_id="year", group_var="continent", var="gini_regional"
).fig

Advanced — add standard-error bars and drop the markers:

ex.prepare_by_group_trend_graph(
    df, ts_id="year", group_var="continent", var="gini_regional",
    error_bars=True, points=False,
).fig

prepare_by_group_violin_graph

This function returns a Plotly figure directly (no .fig).

Basic — the distribution of a variable across groups:

ex.prepare_by_group_violin_graph(df, "continent", "gini_regional")

Advanced — order groups by mean and orient the violins vertically:

ex.prepare_by_group_violin_graph(
    df, "continent", "gini_regional", order_by_mean=True, group_on_y=False
)

prepare_histogram

Basic — a 30-bin histogram:

ex.prepare_histogram(df, "gini_regional").fig

Advanced — finer bins on another variable:

ex.prepare_histogram(df, "gdp_pc", bins=50).fig

prepare_bar_chart

Basic — category counts:

ex.prepare_bar_chart(df, "continent").fig

Advanced — order bars by descending count and set a custom color:

ex.prepare_bar_chart(df, "continent", order_by_count=True, color="red").fig

prepare_missing_values_graph

This function returns a Plotly figure directly (no .fig).

Basic — the fraction of missing values by variable and year:

ex.prepare_missing_values_graph(df, ts_id="year")

Advanced — restrict to numeric variables and show only whether values are missing:

ex.prepare_missing_values_graph(df, ts_id="year", no_factors=True, binary=True)

prepare_scatter_plot

This function returns a Plotly figure directly (no .fig).

Basic — a plain scatter of two variables:

ex.prepare_scatter_plot(df, x="log_gdp_pc", y="gini_regional")

Advanced — map color and size to other columns and add a size-weighted LOESS smoother (the N-shaped Kuznets curve):

ex.prepare_scatter_plot(
    df, x="log_gdp_pc", y="gini_regional",
    color="continent", size="population", loess=2, alpha=0.6,
)

Regression

prepare_regression_table

Basic — a pooled OLS regression of the cubic Kuznets curve:

ex.prepare_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
).etable
gini_regional
(1)
coef
log_gdp_pc 6.385***
(0.134)
log_gdp_pc_sq -0.711***
(0.015)
log_gdp_pc_cu 0.026***
(0.001)
Intercept -18.490***
(0.402)
stats
Observations 880
R2 0.744
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Advanced — absorb two-way (country + year) fixed effects with standard errors clustered by country:

ex.prepare_regression_table(
    df,
    dvs="gini_regional",
    idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
).etable
gini_regional
(1)
coef
log_gdp_pc 6.411***
(0.210)
log_gdp_pc_sq -0.715***
(0.023)
log_gdp_pc_cu 0.026***
(0.001)
fe
year x
country x
stats
Observations 880
R2 0.874
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

prepare_fwl_plot

Basic — the partial relationship of the outcome and a single regressor:

ex.prepare_fwl_plot(df, dv="gini_regional", var="log_gdp_pc").fig

Advanced — residualize on the other cubic terms and two-way fixed effects, with the reported standard error clustered by country:

ex.prepare_fwl_plot(
    df,
    dv="gini_regional",
    var="log_gdp_pc",
    controls=["log_gdp_pc_sq", "log_gdp_pc_cu"],
    feffects=["country", "year"],
    clusters=["country"],
).fig

Data preparation

treat_outliers

Basic — winsorize a single variable at the 1st/99th percentile:

ex.treat_outliers(df["gdp_pc"], percentile=0.01).describe()
count       880.000000
mean      25524.001800
std       34212.744869
min         666.720522
25%        2512.157493
50%       11199.795097
75%       33444.168592
max      150000.000000
Name: gdp_pc, dtype: float64

Advanced — winsorize several columns at the 5th/95th percentile, with cut-offs computed within each continent:

ex.treat_outliers(
    df[["gini_regional", "gdp_pc"]], percentile=0.05, by=df["continent"]
).describe()
gini_regional gdp_pc
count 880.000000 880.000000
mean 0.273319 24703.010142
std 0.086432 32756.473008
min 0.080619 646.309044
25% 0.206811 2512.157493
50% 0.270045 11199.795097
75% 0.334933 32639.241341
max 0.503312 150000.000000