import numpy as np
import pandas as pd
import expdpy as exanalyze_sigma_convergence
Run this page interactively in Google Colab — no install required:
This page is two things at once: an extended user guide for analyze_sigma_convergence — what it does, every argument, and everything it returns — and a testing environment that generates synthetic data with known parameters and checks that the function recovers them. If a cell’s assert ever fails, the function is broken.
What is σ-convergence?
σ-convergence asks whether the cross-sectional dispersion of a variable shrinks over time — whether units become more alike. At each period \(t\) we measure how spread out the variable is across units (the standard deviation \(\sigma_t\), the Gini index, the coefficient of variation) and then regress the log dispersion on time:
\[ \ln D_t \;=\; a + b\, t + \varepsilon_t . \]
The slope \(b\) is the average proportional change in dispersion per period, so a negative \(b\) is σ-convergence (the distribution is narrowing) and a positive \(b\) is σ-divergence. The variable is used as you supply it — the function never transforms it.
σ-convergence is the distributional complement to β-convergence. β-convergence (laggards growing faster) is necessary but not sufficient for σ-convergence: fresh shocks can re-widen the distribution even while poorer units catch up (Quah’s critique). See analyze_beta_convergence for the growth-vs-initial-level view.
1. The method in one cell
analyze_sigma_convergence(df, var, ...) only needs the variable and the panel ids. Here is the cross-country dispersion of life expectancy in the bundled gapminder panel — a bounded, non-negative level, so the standard deviation and the Gini index are both meaningful:
from expdpy.data import load_gapminder
gap = load_gapminder()
res = ex.analyze_sigma_convergence(gap, "lifeExp", entity="country", time="year")
res.figThe standard deviation is on the left axis and the Gini index on the right; the dashed lines are the fitted log-trends. The annotation reports each measure’s per-period trend and whether it is converging.
2. How the function works
Arguments
| argument | what it does | when to change it |
|---|---|---|
var |
the panel variable whose cross-sectional dispersion is tracked (used as-is) | pass it on whatever scale you want measured; the Gini also needs non-negative values |
entity, time |
the panel ids | omit if declared once via set_panel |
start, end |
restrict to a period window (which must still be balanced) | to focus on a sub-period |
min_periods |
minimum periods required to fit a trend (≥ 3) | rarely; the default is fine |
vcov |
"hetero" (HC1, default) or "iid" standard errors |
"iid" for classical SEs; never changes the slope |
The panel must be balanced — every unit present in every period — so the dispersion is comparable across periods (more on that in §4).
Everything it returns
print("scalars :", {k: round(getattr(res, k), 5) for k in
["std_slope", "gini_slope", "cv_slope", "std_pvalue"]})
print("panel :", f"{res.n_units} units x {res.n_periods} periods")
print("per-period frame columns:", list(res.df.columns))
res.df.head()scalars : {'std_slope': -0.00063, 'gini_slope': -0.00681, 'cv_slope': -0.00622, 'std_pvalue': 0.47518}
panel : 142 units x 12 periods
per-period frame columns: ['year', 'n_units', 'mean', 'std', 'gini', 'cv']
| year | n_units | mean | std | gini | cv | |
|---|---|---|---|---|---|---|
| 0 | 1952.0 | 142 | 49.057620 | 12.225956 | 0.141106 | 0.249216 |
| 1 | 1957.0 | 142 | 51.507401 | 12.231286 | 0.135209 | 0.237467 |
| 2 | 1962.0 | 142 | 53.609249 | 12.097245 | 0.128702 | 0.225656 |
| 3 | 1967.0 | 142 | 55.678290 | 11.718858 | 0.120378 | 0.210474 |
| 4 | 1972.0 | 142 | 57.647386 | 11.381953 | 0.112960 | 0.197441 |
summary is the per-measure trend table (slope, SE, p-value, R²) and gt renders it; glance() is the one-row headline:
res.summary.round(5)| measure | slope | se | pvalue | r2 | n_periods_used | converging | |
|---|---|---|---|---|---|---|---|
| 0 | std | -0.00063 | 0.00085 | 0.47518 | 0.04884 | 12 | True |
| 1 | gini | -0.00681 | 0.00116 | 0.00016 | 0.78367 | 12 | True |
| 2 | cv | -0.00622 | 0.00125 | 0.00055 | 0.72279 | 12 | True |
res.gt| σ-convergence: lifeExp | ||||
| trend of log dispersion over 12 periods, 142 units | ||||
| Trend (per period) | Std. error | p-value | σ-convergence | |
|---|---|---|---|---|
| Standard deviation | -0.0006313 | 0.0008509 | 0.475 | yes |
| Gini index | -0.006807 | 0.001159 | 0.000 | yes |
| Coefficient of variation | -0.006217 | 0.001247 | 0.001 | yes |
| A negative trend in log dispersion is σ-convergence (the cross-sectional distribution is narrowing). Trend = OLS slope of ln(dispersion) on time. | ||||
.interpret() reads the result in plain language, and .explain() returns the concept explainer:
print(res.interpret())Across 142 units over 12 periods, the cross-sectional standard deviation of **lifeExp** narrowed (from 12.2 to 12.1). The log-dispersion trend is -0.000631 per period — about 0.0631% less dispersion each period (not statistically significant at conventional levels) — the pattern of **σ-convergence**.
The Gini index also narrowed over the same span (trend -0.00681 per period).
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
3. Does it recover the truth?
The cleanest test uses a geometric-narrowing panel: every unit’s value contracts toward a common mean \(\mu\) at rate \(\rho\) per period, \(x_{i,t} = \mu + (x_{i,0}-\mu)\,\rho^{t}\). Because the deviations from \(\mu\) all shrink by the factor \(\rho^{t}\) while the mean stays fixed, every dispersion measure scales as \(\rho^{t}\) — so the log-dispersion trend equals \(\ln\rho\) exactly for the standard deviation, the Gini index and the coefficient of variation alike.
def sigma_panel(*, n_units=60, n_years=15, rho=0.9, noise=0.0, seed=0):
"""Geometric-narrowing panel x_{i,t} = mu + (x_{i,0}-mu)*rho**t (mu = mean x_0)."""
rng = np.random.default_rng(seed)
x0 = rng.uniform(1.0, 20.0, size=n_units)
mu = float(np.mean(x0))
rows = []
for i in range(n_units):
for t in range(n_years):
val = mu + (float(x0[i]) - mu) * rho**t
if noise:
val += float(rng.normal(0.0, noise))
rows.append((f"C{i:03d}", t, val))
return pd.DataFrame(rows, columns=["country", "year", "x"])
RHO = 0.9
panel = sigma_panel(rho=RHO, seed=1)
fit = ex.analyze_sigma_convergence(panel, "x", entity="country", time="year")
target = np.log(RHO)
check = pd.DataFrame(
{
"measure": ["std", "gini", "cv"],
"true (ln rho)": [target, target, target],
"recovered": [fit.std_slope, fit.gini_slope, fit.cv_slope],
}
)
check["abs_error"] = (check["recovered"] - check["true (ln rho)"]).abs()
check| measure | true (ln rho) | recovered | abs_error | |
|---|---|---|---|---|
| 0 | std | -0.105361 | -0.105361 | 1.526557e-16 |
| 1 | gini | -0.105361 | -0.105361 | 1.804112e-16 |
| 2 | cv | -0.105361 | -0.105361 | 9.714451e-17 |
# Each measure recovers ln(rho) to machine precision on the noiseless DGP.
for slope in (fit.std_slope, fit.gini_slope, fit.cv_slope):
assert abs(slope - target) < 1e-9
assert fit.std_slope < 0 # convergence
print("✅ std, Gini and CV trends all recover ln(rho) exactly")✅ std, Gini and CV trends all recover ln(rho) exactly
A faster contraction (smaller \(\rho\)) means a more negative trend, and that ordering survives noise:
fast = ex.analyze_sigma_convergence(
sigma_panel(rho=0.80, noise=0.02, seed=2), "x", entity="country", time="year"
)
slow = ex.analyze_sigma_convergence(
sigma_panel(rho=0.97, noise=0.02, seed=2), "x", entity="country", time="year"
)
assert fast.std_slope < slow.std_slope < 0
print(f"✅ faster contraction => steeper trend ({fast.std_slope:.3f} < {slow.std_slope:.3f} < 0)")✅ faster contraction => steeper trend (-0.223 < -0.030 < 0)
4. The panel must be balanced
Dispersion is only comparable across periods when the same units are present each period, so the function refuses an unbalanced panel rather than mix a changing composition:
unbalanced = gap.drop(gap[(gap["country"] == "Afghanistan") & (gap["year"] < 1972)].index)
try:
ex.analyze_sigma_convergence(unbalanced, "lifeExp", entity="country", time="year")
except ValueError as exc:
print("ValueError:", exc)ValueError: panel is not balanced: 1 of 142 units are missing in some period and 4 of 12 periods are missing some units. σ-convergence compares dispersion across a fixed set of units; restrict to a balanced window with start=/end= or drop the offending units.
Restrict to a balanced window with start=/end= instead:
res_window = ex.analyze_sigma_convergence(
gap, "lifeExp", entity="country", time="year", start=1972
)
print(f"balanced window: {res_window.n_units} units x {res_window.n_periods} periods")balanced window: 142 units x 8 periods
5. Convergence vs divergence on real data
Cross-country life expectancy has compressed over 1952–2007 — poorer countries caught up — so both the standard deviation and the Gini index drift down (σ-convergence):
print(res.interpret())Across 142 units over 12 periods, the cross-sectional standard deviation of **lifeExp** narrowed (from 12.2 to 12.1). The log-dispersion trend is -0.000631 per period — about 0.0631% less dispersion each period (not statistically significant at conventional levels) — the pattern of **σ-convergence**.
The Gini index also narrowed over the same span (trend -0.00681 per period).
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
The flagship kuznets panel makes the opposite case: the cross-country dispersion of regional inequality has widened, a clean example of σ-divergence (a positive trend):
from expdpy.data import load_kuznets
kz = ex.analyze_sigma_convergence(load_kuznets(), "gini_regional", entity="country", time="year")
print(kz.interpret())Across 80 units over 11 periods, the cross-sectional standard deviation of **gini_regional** widened (from 0.0875 to 0.106). The log-dispersion trend is 0.0231 per period — about 2.34% more dispersion each period (statistically significant at the 1% level) — the pattern of **σ-divergence** rather than convergence.
The Gini index also widened over the same span (trend 0.0155 per period).
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Note on scale: with no transform, the standard deviation is in the variable’s own units and tracks its level, so the Gini and the coefficient of variation (both scale-free) often tell a cleaner story — compare all three in
summary. Raw GDP-per-capita levels diverge in dollars even when log GDP converges, which is why the income case is usually run onlog.
See also
ex.learn_sigma_convergence()— a runnable Learn sandbox that recovers a known dispersion rate on a geometric-narrowing panel.analyze_beta_convergence— the growth-vs-initial-level view.ex.explain("sigma_convergence")— the concept explainer (alsores.explain()).
ex.explain("sigma_convergence")Sigma convergence
What it is. σ-convergence asks whether the cross-sectional dispersion of a variable shrinks over time — whether units become more alike. At each period the dispersion is measured across units (the standard deviation, the Gini index, the coefficient of variation), and the test regresses the log dispersion on time: a negative slope means dispersion falls by a roughly constant proportion each period, the hallmark of σ-convergence. It is the distributional complement to β-convergence: β-convergence (poorer units growing faster) is necessary but not sufficient for σ-convergence, because new shocks can re-spread the distribution even while laggards catch up (Quah’s critique).
When to use it. Use it to describe whether a cross-section is compressing or fanning out over time — income or productivity across regions, test scores across schools, health across countries. Pair it with β-convergence: β answers ‘do laggards grow faster?’ while σ answers ‘is the whole distribution narrowing?’. Report several dispersion measures, since the standard deviation is scale-dependent while the Gini and the coefficient of variation are scale-free.
Watch out for. - σ-convergence is a descriptive statement about the distribution, not a causal mechanism; a narrowing spread does not say why units converged. - The standard deviation is in the variable’s own units and grows with its level; the Gini and the coefficient of variation are scale-free and often tell a clearer story — compare them. - Dispersion is only comparable across periods when the set of units is fixed, so a balanced panel is required; a changing composition can masquerade as convergence or divergence. - The Gini index is only defined for non-negative values, and the coefficient of variation is unstable when the mean is near zero.
See also: beta_convergence, fwl, correlation_vs_causation
References: Barro & Sala-i-Martin, Economic Growth (2nd ed.), ch. 11; Sala-i-Martin (1996), ‘The Classical Approach to Convergence Analysis’, EJ; Quah (1993), ‘Galton’s Fallacy and Tests of the Convergence Hypothesis’, SJE