import expdpy as ex
from expdpy.data import load_kuznets, load_kuznets_data_def
df = ex.set_labels(load_kuznets(), load_kuznets_data_def(), set_panel=True)Learn panel data
The Learn module is the teaching layer behind the rest of the library. The Explore and Analyze case studies leaned on a handful of ideas — within vs between variation, fixed effects, clustered standard errors, convergence, the Kuznets wave. This page explains why those moves work, two complementary ways:
- A plain-language layer on every result. Each
explore_*/analyze_*result can explain itself with.interpret()(an associational reading) and.explain()(a concept explainer). - Simulated sandboxes. Each
learn_*function generates its own data from known parameters, so you can see an estimator hit — or miss — a truth you control, and turn the knobs.
Read it top to bottom: we start from the real Kuznets model you fit in Analyze, browse the concept index, then isolate each idea in a sandbox — the mechanics of fixed effects, two inference classics, convergence, and the Kuznets wave itself.
The sandboxes simulate data to make a teaching point; the .interpret() text is always associational (never a causal claim). The correlation_vs_causation explainer spells out what a causal reading would additionally require.
Stage 1 — Read a real result in plain language
Fit a model anywhere in Analyze, then ask it to explain itself. Here is the two-way fixed-effects cubic Kuznets curve from the Analyze case study. .interpret() gives an associational reading of what was estimated:
res = ex.analyze_regression_table(
df,
dvs="gini_regional",
idvs=["log_gdp_pc", "log_gdp_pc_sq", "log_gdp_pc_cu"],
feffects=["country", "year"],
clusters=["country"],
)
print(res.interpret())This OLS regression relates **gini_regional** to its regressors. Fixed effects for *country + year* absorb time-invariant differences, so coefficients reflect variation **within** each group. Standard errors are clustered by *country*.
- **log_gdp_pc**: each one-unit increase is associated with gini_regional that is 6.41 higher (statistically significant at the 1% level).
- **log_gdp_pc_sq**: each one-unit increase is associated with gini_regional that is 0.715 lower (statistically significant at the 1% level).
- **log_gdp_pc_cu**: each one-unit increase is associated with gini_regional that is 0.0261 higher (statistically significant at the 1% level).
Model fit: N = 880, R² = 0.874, within-R² = 0.521.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
.explain() returns the concept explainer for the method — what it is, when to use it, and its caveats — the same content as ex.explain("fixed_effects"):
print(res.explain().to_markdown())### Fixed effects
**What it is.** Fixed effects add a separate intercept for every level of a grouping variable (e.g. each country, each year). This absorbs all *time-invariant* differences across groups, so the remaining coefficients are identified from variation *within* a group over time rather than from differences *between* groups.
**When to use it.** In panel data, to control for stable, unobserved characteristics of units (country institutions, firm culture) and for common shocks in a period (year effects). Two-way (unit + time) fixed effects are the standard panel specification.
**Watch out for.**
- Fixed effects cannot estimate the effect of anything constant within the group (a country's region, a person's sex) — that variation is absorbed.
- They control only for *unobserved* confounders that are constant within the group; time-varying confounders remain a threat.
- Many groups with few observations each can leave little within-variation, inflating standard errors.
*See also:* ols, clustered_se, fwl
*References:* Wooldridge, Introductory Econometrics, ch. 13-14
Every explore_* and analyze_* result carries these two methods. The rest of this page is the ideas they describe, isolated one at a time.
Stage 2 — The browsable concept index
ex.list_topics() returns every registered concept explainer (currently 27); pass any of them (or an alias) to ex.explain(...):
ex.list_topics()['beta_convergence',
'clustered_se',
'convergence_clubs',
'correlated_random_effects',
'correlation_vs_causation',
'descriptive_stats',
'dummy_variables',
'event_study',
'first_differences',
'fixed_effects',
'fwl',
'hausman',
'kuznets_waves',
'ols',
'omitted_variable_bias',
'panel_structure',
'parallel_trends',
'pearson',
'random_effects',
'sigma_convergence',
'spearman',
'time_trends',
'transition_matrix',
'truncate',
'winsorize',
'within_between_variation',
'within_transformation']
ex.explain("fixed_effects")Fixed effects
What it is. Fixed effects add a separate intercept for every level of a grouping variable (e.g. each country, each year). This absorbs all time-invariant differences across groups, so the remaining coefficients are identified from variation within a group over time rather than from differences between groups.
When to use it. In panel data, to control for stable, unobserved characteristics of units (country institutions, firm culture) and for common shocks in a period (year effects). Two-way (unit + time) fixed effects are the standard panel specification.
Watch out for. - Fixed effects cannot estimate the effect of anything constant within the group (a country’s region, a person’s sex) — that variation is absorbed. - They control only for unobserved confounders that are constant within the group; time-varying confounders remain a threat. - Many groups with few observations each can leave little within-variation, inflating standard errors.
See also: ols, clustered_se, fwl
References: Wooldridge, Introductory Econometrics, ch. 13-14
The full catalog, grouped by theme — every entry is a key you can pass to ex.explain(...):
| Theme | Explainer topics |
|---|---|
| OLS & regression | ols, fwl, omitted_variable_bias, descriptive_stats |
| Panel structure & variation | panel_structure, within_between_variation, time_trends, transition_matrix |
| Fixed effects & the within transform | fixed_effects, within_transformation, dummy_variables, first_differences |
| Random effects, CRE & Hausman | random_effects, correlated_random_effects, hausman |
| Standard errors & inference | clustered_se |
| Convergence | beta_convergence, sigma_convergence, convergence_clubs, kuznets_waves |
| Causal designs / DiD | event_study, parallel_trends, correlation_vs_causation |
| Correlation | pearson, spearman |
| Outlier treatment | winsorize, truncate |
Stage 3 — The core identity: first differences ≈ demeaning ≈ dummy variables
A unit fixed effect is anything constant within a unit over time. Three transformations all remove it and recover the same within-unit slope:
- First differences subtract each unit’s previous-period value (Δy on Δx).
- The within transformation (demeaning) subtracts each unit’s time-average.
- Least-squares dummy variables (LSDV) add one dummy per unit to an OLS regression.
learn_first_differences simulates a panel where the regressor is correlated with the unit effect (so pooled OLS is biased) and recovers the slope by first differences and by demeaning. On a two-period panel the two coincide exactly, and both recover the truth:
fd = ex.learn_first_differences(n_periods=2)
print(fd.interpret())
fd.figPooled OLS estimates 2.72, biased by the unit effects. First differencing gives 1.93 and the within (demeaning) estimator 1.93 — both recover the true 2, because differencing and demeaning each cancel the unit effect. On this two-period panel they coincide (gap 0).
learn_within_vs_lsdv shows that demeaning and unit dummies give the identical slope — the Frisch–Waugh–Lovell theorem at work — for any number of periods:
ex.learn_within_vs_lsdv(n_periods=6).figStage 4 — Why fixed effects matter
learn_pooled_vs_fixed_effects makes the bias concrete: when the unit effect is correlated with the regressor, pooled OLS is biased, and using only within-unit variation (fixed effects) recovers the true slope. This is exactly the move the Analyze case study made on the Kuznets curve.
pf = ex.learn_pooled_vs_fixed_effects(unit_effect_corr=0.8)
print(pf.interpret())
pf.figPooled OLS estimates 1.78 for the slope, biased by the correlation between the regressor and the unit effects. Adding unit fixed effects recovers 1.04, close to the true 1 — the within estimator removes the bias.
The correlated random effects (Mundlak) estimator bridges fixed and random effects — see its explainer, and analyze_cre_table in Analyze:
ex.explain("correlated_random_effects")Stage 5 — Two inference classics
Omitted-variable bias — what happens to a coefficient when a correlated confounder is left out. The short regression is biased; controlling for the confounder recovers the truth:
ex.learn_omitted_variable_bias(corr_xz=0.6).figClustered standard errors — clustering changes the standard error, not the point estimate. Ignoring within-cluster correlation understates uncertainty (the bars shrink too far):
ex.learn_clustering_se(icc=0.3).figStage 6 — Convergence, simulated
The Analyze case study asked whether incomes converge. These sandboxes plant a known answer so you can watch each tool recover it.
learn_beta_convergence — absolute vs conditional convergence on a known-parameter AR(1) panel: omitting a steady-state determinant biases the unconditional slope; conditioning on it recovers the true convergence speed. (See analyze_beta_convergence on real data.)
bc = ex.learn_beta_convergence(rho=0.9, gamma=0.6, corr=0.7)
print(bc.interpret())
bc.figOmitting the steady-state determinant, the unconditional slope is 0.0359 — biased away from the true convergence slope -0.0439, so the units look like they barely converge. Conditioning on the determinant recovers -0.044, matching the truth — that is conditional β-convergence. The recovered speed is 0.106 per period (true 0.105), a half-life of 6.51 periods.
learn_sigma_convergence — a panel whose cross-sectional dispersion narrows at a known rate: the standard deviation, the Gini index and the coefficient of variation all shrink at the log-rate ln(rho), and the function recovers it. (See analyze_sigma_convergence on real data.)
ex.learn_sigma_convergence(rho=0.93).figlearn_convergence_clubs — a panel with a planted club structure: each unit belongs to one of several clubs converging to distinct levels, so the panel does not converge globally, yet the Phillips–Sul clustering algorithm recovers the clubs without being told they exist. (See analyze_convergence_clubs on real data.)
ex.learn_convergence_clubs().figStage 7 — The Kuznets wave, simulated
learn_kuznets_waves is the teaching counterpart to the flagship analyze_kuznets_waves. It plants a known polynomial wave into a panel and fits it under three estimators. The within and pooled estimators recover the true top-order coefficient; the between estimator differs — because the average of a nonlinear function is not the function of the average. That gap is exactly why the Analyze case study asked whether the N-shape was a within- or a between-country pattern.
kw = ex.learn_kuznets_waves()
print(kw.interpret())
kw.figThe panel was built with a known degree-4 Kuznets wave whose top-order coefficient is 0.04. The within (two-way fixed-effects) estimator recovers 0.0399 and pooled OLS 0.0388 — both close to the truth, because the wave is a within-unit relationship. The between estimator gives 0.0296, which differs: it compares unit averages, and the average of a nonlinear curve is not the curve of the average.
Where to go next
- Explore panel data — describe your data before you model it.
- Analyze panel data — the estimators these ideas underpin, on the real Kuznets panel.
- Browse every concept explainer in the API reference, or pass any
list_topics()key toex.explain(...).