import numpy as np
import expdpy as ex
from expdpy.data import load_kuznets
df = load_kuznets()Examples gallery
A worked basic (minimal call) and advanced (full options) example for every analytical function in expdpy, each run live on the bundled kuznets country–year panel so you can see the actual output. This mirrors the Function reference one entry at a time; for a narrative tour of a typical workflow, see the Quickstart instead.
Every example assumes the following setup:
Tables
prepare_descriptive_table
N, mean, standard deviation and quantiles for the numeric variables.
Basic:
ex.prepare_descriptive_table(df).gt| Descriptive Statistics | ||||||||
| N | Mean | Std. dev. | Min. | 25 % | Median | 75 % | Max. | |
|---|---|---|---|---|---|---|---|---|
| year | 880 | 2,020.000 | 3.164 | 2,015.000 | 2,017.000 | 2,020.000 | 2,023.000 | 2,025.000 |
| gini_regional | 880 | 0.273 | 0.091 | 0.020 | 0.207 | 0.270 | 0.335 | 0.568 |
| gdp_pc | 880 | 25,523.391 | 34,213.190 | 531.027 | 2,512.157 | 11,199.795 | 33,444.169 | 150,000.000 |
| population | 880 | 19,699,168.035 | 47,939,729.166 | 209,936.000 | 1,759,924.250 | 4,936,045.500 | 19,361,656.000 | 368,704,165.000 |
| resource_rents | 880 | 14.962 | 8.634 | 0.341 | 9.022 | 13.075 | 18.813 | 44.025 |
| arable_land | 880 | 0.245 | 0.121 | 0.017 | 0.165 | 0.219 | 0.341 | 0.571 |
| trade_share | 880 | 0.611 | 0.236 | 0.198 | 0.443 | 0.555 | 0.725 | 1.421 |
| fdi_share | 834 | 0.023 | 0.057 | −0.151 | −0.019 | 0.021 | 0.062 | 0.184 |
| area | 880 | 450,980.746 | 1,018,019.939 | 1,224.257 | 30,587.671 | 125,400.969 | 494,298.458 | 8,189,082.989 |
| gasoline_price | 648 | 0.554 | 0.218 | 0.200 | 0.378 | 0.562 | 0.725 | 1.055 |
| aid | 815 | 351,203,123.507 | 505,348,429.052 | −8,053,828.003 | 56,235,278.824 | 131,148,458.214 | 435,529,190.612 | 2,200,000,000.000 |
| school_enrollment | 735 | 55.507 | 26.120 | 6.000 | 34.218 | 56.896 | 75.026 | 125.014 |
| gini_lights | 747 | 0.283 | 0.132 | 0.010 | 0.194 | 0.282 | 0.375 | 0.668 |
| polity2 | 788 | 0.547 | 4.766 | −10.000 | −3.000 | 1.000 | 4.000 | 10.000 |
| federal | 880 | 0.200 | 0.400 | 0.000 | 0.000 | 0.000 | 0.000 | 1.000 |
| log_gdp_pc | 880 | 9.205 | 1.490 | 6.275 | 7.829 | 9.324 | 10.418 | 11.918 |
| log_gdp_pc_sq | 880 | 86.958 | 27.476 | 39.373 | 61.292 | 86.930 | 108.527 | 142.048 |
| log_gdp_pc_cu | 880 | 841.264 | 388.431 | 247.060 | 479.845 | 810.509 | 1,130.595 | 1,692.984 |
Advanced — set the decimals per statistic (None drops that column) and a caption:
ex.prepare_descriptive_table(
df,
digits=(0, 2, 2, None, None, 2, None, None),
caption="Kuznets panel",
).gt| Kuznets panel | ||||
| N | Mean | Std. dev. | Median | |
|---|---|---|---|---|
| year | 880 | 2,020.00 | 3.16 | 2,020.00 |
| gini_regional | 880 | 0.27 | 0.09 | 0.27 |
| gdp_pc | 880 | 25,523.39 | 34,213.19 | 11,199.80 |
| population | 880 | 19,699,168.04 | 47,939,729.17 | 4,936,045.50 |
| resource_rents | 880 | 14.96 | 8.63 | 13.08 |
| arable_land | 880 | 0.25 | 0.12 | 0.22 |
| trade_share | 880 | 0.61 | 0.24 | 0.55 |
| fdi_share | 834 | 0.02 | 0.06 | 0.02 |
| area | 880 | 450,980.75 | 1,018,019.94 | 125,400.97 |
| gasoline_price | 648 | 0.55 | 0.22 | 0.56 |
| aid | 815 | 351,203,123.51 | 505,348,429.05 | 131,148,458.21 |
| school_enrollment | 735 | 55.51 | 26.12 | 56.90 |
| gini_lights | 747 | 0.28 | 0.13 | 0.28 |
| polity2 | 788 | 0.55 | 4.77 | 1.00 |
| federal | 880 | 0.20 | 0.40 | 0.00 |
| log_gdp_pc | 880 | 9.21 | 1.49 | 9.32 |
| log_gdp_pc_sq | 880 | 86.96 | 27.48 | 86.93 |
| log_gdp_pc_cu | 880 | 841.26 | 388.43 | 810.51 |
prepare_correlation_table
Pearson correlations above the diagonal, Spearman below; significant cells in bold.
Basic:
ex.prepare_correlation_table(df[["gini_regional", "gdp_pc", "log_gdp_pc"]]).gt| A | B | C | |
|---|---|---|---|
| A: gini_regional | 0.20 | -0.09 | |
| B: gdp_pc | -0.19 | 0.82 | |
| C: log_gdp_pc | -0.19 | 1.00 | |
| This table reports Pearson correlations above and Spearman correlations below the diagonal. Number of observations: 880. Correlations with significance levels below 5% appear in bold. | |||
Advanced — more decimals, a stricter bold threshold, and a caption:
ex.prepare_correlation_table(
df[["gini_regional", "gdp_pc", "log_gdp_pc", "trade_share"]],
digits=3,
bold=0.01,
caption="Correlations (kuznets)",
).gt| Correlations (kuznets) | ||||
| A | B | C | D | |
|---|---|---|---|---|
| A: gini_regional | 0.202 | -0.088 | -0.155 | |
| B: gdp_pc | -0.190 | 0.825 | -0.082 | |
| C: log_gdp_pc | -0.190 | 1.000 | 0.028 | |
| D: trade_share | -0.148 | 0.058 | 0.058 | |
| This table reports Pearson correlations above and Spearman correlations below the diagonal. Number of observations: 880. Correlations with significance levels below 1% appear in bold. | ||||
prepare_ext_obs_table
The top and bottom n observations of a variable.
Basic:
ex.prepare_ext_obs_table(df, n=5).gt/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
separator.loc[:, :] = np.nan
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
separator.loc[:, :] = np.nan
/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
separator.loc[:, :] = np.nan
| country | iso | year | continent | gini_regional | gdp_pc | population | resource_rents | arable_land | trade_share | fdi_share | area | gasoline_price | aid | school_enrollment | gini_lights | polity2 | federal | log_gdp_pc | log_gdp_pc_sq | log_gdp_pc_cu |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| country 80 | C80 | 2,025.000 | Continent E | 0.494 | 150,000.000 | 5,878,456.000 | 5.328 | 0.439 | 0.550 | −0.009 | 262,807.424 | 0.822 | 66,326,822.935 | 109.834 | 0.270 | 9.000 | 0.000 | 11.918 | 142.048 | 1,692.984 |
| country 80 | C80 | 2,024.000 | Continent E | 0.539 | 150,000.000 | 5,733,636.000 | 4.602 | 0.444 | 0.509 | −0.031 | 262,807.424 | 0.881 | 59,108,090.042 | 105.354 | 0.379 | 9.000 | 0.000 | 11.918 | 142.048 | 1,692.984 |
| country 80 | C80 | 2,022.000 | Continent E | 0.528 | 150,000.000 | 5,454,610.000 | 4.504 | 0.449 | 0.516 | −0.081 | 262,807.424 | 0.927 | 57,205,742.488 | 113.707 | 0.619 | 10.000 | 0.000 | 11.918 | 142.048 | 1,692.984 |
| country 80 | C80 | 2,023.000 | Continent E | 0.568 | 150,000.000 | 5,592,383.000 | 7.703 | 0.439 | 0.467 | 0.009 | 262,807.424 | 0.984 | 61,028,630.174 | 115.897 | 0.517 | 10.000 | 0.000 | 11.918 | 142.048 | 1,692.984 |
| country 80 | C80 | 2,021.000 | Continent E | 0.491 | 150,000.000 | 5,320,231.000 | 8.007 | 0.437 | 0.490 | −0.063 | 262,807.424 | 0.821 | 59,797,131.352 | 111.021 | 0.529 | 10.000 | 0.000 | 11.918 | 142.048 | 1,692.984 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| country 4 | C04 | 2,022.000 | Continent A | 0.099 | 626.936 | 1,170,686.000 | 16.226 | 0.310 | 0.384 | 0.042 | 518,862.110 | 0.200 | 209,371,704.365 | 6.000 | ... | −10.000 | 0.000 | 6.441 | 41.484 | 267.195 |
| country 4 | C04 | 2,021.000 | Continent A | 0.033 | 611.293 | 1,153,163.000 | 13.580 | 0.312 | 0.479 | 0.056 | 518,862.110 | 0.302 | 180,576,084.351 | 6.000 | 0.182 | −9.000 | 0.000 | 6.416 | 41.160 | 264.063 |
| country 4 | C04 | 2,023.000 | Continent A | 0.020 | 580.034 | 1,188,475.000 | 13.138 | 0.273 | 0.472 | 0.047 | 518,862.110 | 0.200 | 204,319,197.473 | 8.851 | 0.145 | ... | 0.000 | 6.363 | 40.489 | 257.634 |
| country 4 | C04 | 2,024.000 | Continent A | 0.020 | 571.021 | 1,206,535.000 | 10.032 | 0.318 | 0.400 | 0.056 | 518,862.110 | 0.271 | 168,084,858.461 | 6.000 | 0.287 | −7.000 | 0.000 | 6.347 | 40.290 | 255.737 |
| country 4 | C04 | 2,025.000 | Continent A | 0.020 | 531.027 | 1,224,868.000 | 10.675 | 0.305 | 0.432 | 0.022 | 518,862.110 | 0.200 | 172,810,451.647 | 9.809 | 0.360 | −3.000 | 0.000 | 6.275 | 39.373 | 247.060 |
Advanced — the ten most extreme observations of a chosen variable, showing only the panel identifiers and that variable:
ex.prepare_ext_obs_table(
df, n=10, cs_id=["country"], ts_id="year", var="gini_regional"
).gt/home/runner/work/expdpy/expdpy/src/expdpy/tables.py:336: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value 'nan' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.
separator.loc[:, :] = np.nan
| country | year | gini_regional |
|---|---|---|
| country 80 | 2,023.000 | 0.568 |
| country 80 | 2,024.000 | 0.539 |
| country 69 | 2,025.000 | 0.532 |
| country 80 | 2,022.000 | 0.528 |
| country 79 | 2,021.000 | 0.524 |
| country 28 | 2,018.000 | 0.516 |
| country 79 | 2,025.000 | 0.513 |
| country 69 | 2,024.000 | 0.511 |
| country 79 | 2,023.000 | 0.510 |
| country 28 | 2,025.000 | 0.509 |
| ... | ... | ... |
| country 1 | 2,015.000 | 0.086 |
| country 43 | 2,021.000 | 0.084 |
| country 1 | 2,016.000 | 0.080 |
| country 4 | 2,017.000 | 0.065 |
| country 16 | 2,025.000 | 0.057 |
| country 4 | 2,020.000 | 0.035 |
| country 4 | 2,021.000 | 0.033 |
| country 4 | 2,024.000 | 0.020 |
| country 4 | 2,025.000 | 0.020 |
| country 4 | 2,023.000 | 0.020 |
Graphs
prepare_correlation_graph
Basic — a correlation heatmap:
ex.prepare_correlation_graph(df[["gini_regional", "gdp_pc", "log_gdp_pc"]]).figAdvanced — the ellipse style (R corrplot look):
ex.prepare_correlation_graph(
df[["gini_regional", "gdp_pc", "log_gdp_pc", "trade_share"]],
style="ellipse",
).figprepare_trend_graph
Basic — the mean of one variable over time, with standard-error bars:
ex.prepare_trend_graph(df, ts_id="year", var=["gini_regional"]).figAdvanced — several variables on one chart:
ex.prepare_trend_graph(df, ts_id="year", var=["gini_regional", "trade_share"]).figprepare_quantile_trend_graph
Basic — the default quantiles of a variable over time:
ex.prepare_quantile_trend_graph(df, ts_id="year", var="gini_regional").figAdvanced — custom quantile levels and no per-observation points:
ex.prepare_quantile_trend_graph(
df, ts_id="year", var="gini_regional", quantiles=(0.1, 0.5, 0.9), points=False
).figprepare_by_group_bar_graph
Basic — the group mean of a variable:
ex.prepare_by_group_bar_graph(df, "continent", "gini_regional").figAdvanced — a different statistic, bars ordered by it, and a custom color:
ex.prepare_by_group_bar_graph(
df, "continent", "gini_regional",
stat_fun=np.nanmedian, order_by_stat=True, color="#4682b4",
).figprepare_by_group_trend_graph
Basic — one line per group over time:
ex.prepare_by_group_trend_graph(
df, ts_id="year", group_var="continent", var="gini_regional"
).figAdvanced — add standard-error bars and drop the markers:
ex.prepare_by_group_trend_graph(
df, ts_id="year", group_var="continent", var="gini_regional",
error_bars=True, points=False,
).figprepare_by_group_violin_graph
This function returns a Plotly figure directly (no .fig).
Basic — the distribution of a variable across groups:
ex.prepare_by_group_violin_graph(df, "continent", "gini_regional")Advanced — order groups by mean and orient the violins vertically:
ex.prepare_by_group_violin_graph(
df, "continent", "gini_regional", order_by_mean=True, group_on_y=False
)prepare_histogram
Basic — a 30-bin histogram:
ex.prepare_histogram(df, "gini_regional").figAdvanced — finer bins on another variable:
ex.prepare_histogram(df, "gdp_pc", bins=50).figprepare_bar_chart
Basic — category counts:
ex.prepare_bar_chart(df, "continent").figAdvanced — order bars by descending count and set a custom color:
ex.prepare_bar_chart(df, "continent", order_by_count=True, color="red").figprepare_missing_values_graph
This function returns a Plotly figure directly (no .fig).
Basic — the fraction of missing values by variable and year:
ex.prepare_missing_values_graph(df, ts_id="year")Advanced — restrict to numeric variables and show only whether values are missing:
ex.prepare_missing_values_graph(df, ts_id="year", no_factors=True, binary=True)prepare_scatter_plot
This function returns a Plotly figure directly (no .fig).
Basic — a plain scatter of two variables:
ex.prepare_scatter_plot(df, x="log_gdp_pc", y="gini_regional")Advanced — map color and size to other columns and add a size-weighted LOESS smoother (the N-shaped Kuznets curve):
ex.prepare_scatter_plot(
df, x="log_gdp_pc", y="gini_regional",
color="continent", size="population", loess=2, alpha=0.6,
)Regression
prepare_regression_table
Basic — a pooled OLS regression of the cubic Kuznets curve:
ex.prepare_regression_table(
df,
dvs="gini_regional",
idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
).etable| gini_regional | |
|---|---|
| (1) | |
| coef | |
| log_gdp_pc | 6.385*** (0.134) |
| log_gdp_pc_sq | -0.711*** (0.015) |
| log_gdp_pc_cu | 0.026*** (0.001) |
| Intercept | -18.490*** (0.402) |
| stats | |
| Observations | 880 |
| R2 | 0.744 |
| Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) | |
Advanced — absorb two-way (country + year) fixed effects with standard errors clustered by country:
ex.prepare_regression_table(
df,
dvs="gini_regional",
idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
feffects=["country", "year"],
clusters=["country"],
).etable| gini_regional | |
|---|---|
| (1) | |
| coef | |
| log_gdp_pc | 6.411*** (0.210) |
| log_gdp_pc_sq | -0.715*** (0.023) |
| log_gdp_pc_cu | 0.026*** (0.001) |
| fe | |
| year | x |
| country | x |
| stats | |
| Observations | 880 |
| R2 | 0.874 |
| Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) | |
prepare_fwl_plot
Basic — the partial relationship of the outcome and a single regressor:
ex.prepare_fwl_plot(df, dv="gini_regional", var="log_gdp_pc").figAdvanced — residualize on the other cubic terms and two-way fixed effects, with the reported standard error clustered by country:
ex.prepare_fwl_plot(
df,
dv="gini_regional",
var="log_gdp_pc",
controls=["log_gdp_pc_sq", "log_gdp_pc_cu"],
feffects=["country", "year"],
clusters=["country"],
).figData preparation
treat_outliers
Basic — winsorize a single variable at the 1st/99th percentile:
ex.treat_outliers(df["gdp_pc"], percentile=0.01).describe()count 880.000000
mean 25524.001800
std 34212.744869
min 666.720522
25% 2512.157493
50% 11199.795097
75% 33444.168592
max 150000.000000
Name: gdp_pc, dtype: float64
Advanced — winsorize several columns at the 5th/95th percentile, with cut-offs computed within each continent:
ex.treat_outliers(
df[["gini_regional", "gdp_pc"]], percentile=0.05, by=df["continent"]
).describe()| gini_regional | gdp_pc | |
|---|---|---|
| count | 880.000000 | 880.000000 |
| mean | 0.273319 | 24703.010142 |
| std | 0.086432 | 32756.473008 |
| min | 0.080619 | 646.309044 |
| 25% | 0.206811 | 2512.157493 |
| 50% | 0.270045 | 11199.795097 |
| 75% | 0.334933 | 32639.241341 |
| max | 0.503312 | 150000.000000 |