analyze_sigma_convergence

Run this page interactively in Google Colab — no install required:
Open In Colab

This page is two things at once: an extended user guide for analyze_sigma_convergence — what it does, every argument, and everything it returns — and a testing environment that generates synthetic data with known parameters and checks that the function recovers them. If a cell’s assert ever fails, the function is broken.

What is σ-convergence?

σ-convergence asks whether the cross-sectional dispersion of a variable shrinks over time — whether units become more alike. At each period \(t\) we measure how spread out the variable is across units (the standard deviation \(\sigma_t\), the Gini index, the coefficient of variation) and then regress the log dispersion on time:

\[ \ln D_t \;=\; a + b\, t + \varepsilon_t . \]

The slope \(b\) is the average proportional change in dispersion per period, so a negative \(b\) is σ-convergence (the distribution is narrowing) and a positive \(b\) is σ-divergence. The variable is used as you supply it — the function never transforms it.

σ-convergence is the distributional complement to β-convergence. β-convergence (laggards growing faster) is necessary but not sufficient for σ-convergence: fresh shocks can re-widen the distribution even while poorer units catch up (Quah’s critique). See analyze_beta_convergence for the growth-vs-initial-level view.

import numpy as np
import pandas as pd

import expdpy as ex

1. The method in one cell

analyze_sigma_convergence(df, var, ...) only needs the variable and the panel ids. Here is the cross-country dispersion of life expectancy in the bundled gapminder panel — a bounded, non-negative level, so the standard deviation and the Gini index are both meaningful:

from expdpy.data import load_gapminder

gap = load_gapminder()
res = ex.analyze_sigma_convergence(gap, "lifeExp", entity="country", time="year")
res.fig

The standard deviation is on the left axis and the Gini index on the right; the dashed lines are the fitted log-trends. The annotation reports each measure’s per-period trend and whether it is converging.

2. How the function works

Arguments

argument what it does when to change it
var the panel variable whose cross-sectional dispersion is tracked (used as-is) pass it on whatever scale you want measured; the Gini also needs non-negative values
entity, time the panel ids omit if declared once via set_panel
start, end restrict to a period window (which must still be balanced) to focus on a sub-period
min_periods minimum periods required to fit a trend (≥ 3) rarely; the default is fine
vcov "hetero" (HC1, default) or "iid" standard errors "iid" for classical SEs; never changes the slope

The panel must be balanced — every unit present in every period — so the dispersion is comparable across periods (more on that in §4).

Everything it returns

print("scalars :", {k: round(getattr(res, k), 5) for k in
                    ["std_slope", "gini_slope", "cv_slope", "std_pvalue"]})
print("panel   :", f"{res.n_units} units x {res.n_periods} periods")
print("per-period frame columns:", list(res.df.columns))
res.df.head()
scalars : {'std_slope': -0.00063, 'gini_slope': -0.00681, 'cv_slope': -0.00622, 'std_pvalue': 0.47518}
panel   : 142 units x 12 periods
per-period frame columns: ['year', 'n_units', 'mean', 'std', 'gini', 'cv']
year n_units mean std gini cv
0 1952.0 142 49.057620 12.225956 0.141106 0.249216
1 1957.0 142 51.507401 12.231286 0.135209 0.237467
2 1962.0 142 53.609249 12.097245 0.128702 0.225656
3 1967.0 142 55.678290 11.718858 0.120378 0.210474
4 1972.0 142 57.647386 11.381953 0.112960 0.197441

summary is the per-measure trend table (slope, SE, p-value, R²) and gt renders it; glance() is the one-row headline:

res.summary.round(5)
measure slope se pvalue r2 n_periods_used converging
0 std -0.00063 0.00085 0.47518 0.04884 12 True
1 gini -0.00681 0.00116 0.00016 0.78367 12 True
2 cv -0.00622 0.00125 0.00055 0.72279 12 True
res.gt
σ-convergence: lifeExp
trend of log dispersion over 12 periods, 142 units
Trend (per period) Std. error p-value σ-convergence
Standard deviation -0.0006313 0.0008509 0.475 yes
Gini index -0.006807 0.001159 0.000 yes
Coefficient of variation -0.006217 0.001247 0.001 yes
A negative trend in log dispersion is σ-convergence (the cross-sectional distribution is narrowing). Trend = OLS slope of ln(dispersion) on time.

.interpret() reads the result in plain language, and .explain() returns the concept explainer:

print(res.interpret())
Across 142 units over 12 periods, the cross-sectional standard deviation of **lifeExp** narrowed (from 12.2 to 12.1). The log-dispersion trend is -0.000631 per period — about 0.0631% less dispersion each period (not statistically significant at conventional levels) — the pattern of **σ-convergence**.
The Gini index also narrowed over the same span (trend -0.00681 per period).

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

3. Does it recover the truth?

The cleanest test uses a geometric-narrowing panel: every unit’s value contracts toward a common mean \(\mu\) at rate \(\rho\) per period, \(x_{i,t} = \mu + (x_{i,0}-\mu)\,\rho^{t}\). Because the deviations from \(\mu\) all shrink by the factor \(\rho^{t}\) while the mean stays fixed, every dispersion measure scales as \(\rho^{t}\) — so the log-dispersion trend equals \(\ln\rho\) exactly for the standard deviation, the Gini index and the coefficient of variation alike.

def sigma_panel(*, n_units=60, n_years=15, rho=0.9, noise=0.0, seed=0):
    """Geometric-narrowing panel x_{i,t} = mu + (x_{i,0}-mu)*rho**t (mu = mean x_0)."""
    rng = np.random.default_rng(seed)
    x0 = rng.uniform(1.0, 20.0, size=n_units)
    mu = float(np.mean(x0))
    rows = []
    for i in range(n_units):
        for t in range(n_years):
            val = mu + (float(x0[i]) - mu) * rho**t
            if noise:
                val += float(rng.normal(0.0, noise))
            rows.append((f"C{i:03d}", t, val))
    return pd.DataFrame(rows, columns=["country", "year", "x"])


RHO = 0.9
panel = sigma_panel(rho=RHO, seed=1)
fit = ex.analyze_sigma_convergence(panel, "x", entity="country", time="year")

target = np.log(RHO)
check = pd.DataFrame(
    {
        "measure": ["std", "gini", "cv"],
        "true (ln rho)": [target, target, target],
        "recovered": [fit.std_slope, fit.gini_slope, fit.cv_slope],
    }
)
check["abs_error"] = (check["recovered"] - check["true (ln rho)"]).abs()
check
measure true (ln rho) recovered abs_error
0 std -0.105361 -0.105361 1.526557e-16
1 gini -0.105361 -0.105361 1.804112e-16
2 cv -0.105361 -0.105361 9.714451e-17
# Each measure recovers ln(rho) to machine precision on the noiseless DGP.
for slope in (fit.std_slope, fit.gini_slope, fit.cv_slope):
    assert abs(slope - target) < 1e-9
assert fit.std_slope < 0  # convergence
print("✅ std, Gini and CV trends all recover ln(rho) exactly")
✅ std, Gini and CV trends all recover ln(rho) exactly

A faster contraction (smaller \(\rho\)) means a more negative trend, and that ordering survives noise:

fast = ex.analyze_sigma_convergence(
    sigma_panel(rho=0.80, noise=0.02, seed=2), "x", entity="country", time="year"
)
slow = ex.analyze_sigma_convergence(
    sigma_panel(rho=0.97, noise=0.02, seed=2), "x", entity="country", time="year"
)
assert fast.std_slope < slow.std_slope < 0
print(f"✅ faster contraction => steeper trend  ({fast.std_slope:.3f} < {slow.std_slope:.3f} < 0)")
✅ faster contraction => steeper trend  (-0.223 < -0.030 < 0)

4. The panel must be balanced

Dispersion is only comparable across periods when the same units are present each period, so the function refuses an unbalanced panel rather than mix a changing composition:

unbalanced = gap.drop(gap[(gap["country"] == "Afghanistan") & (gap["year"] < 1972)].index)
try:
    ex.analyze_sigma_convergence(unbalanced, "lifeExp", entity="country", time="year")
except ValueError as exc:
    print("ValueError:", exc)
ValueError: panel is not balanced: 1 of 142 units are missing in some period and 4 of 12 periods are missing some units. σ-convergence compares dispersion across a fixed set of units; restrict to a balanced window with start=/end= or drop the offending units.

Restrict to a balanced window with start=/end= instead:

res_window = ex.analyze_sigma_convergence(
    gap, "lifeExp", entity="country", time="year", start=1972
)
print(f"balanced window: {res_window.n_units} units x {res_window.n_periods} periods")
balanced window: 142 units x 8 periods

5. Convergence vs divergence on real data

Cross-country life expectancy has compressed over 1952–2007 — poorer countries caught up — so both the standard deviation and the Gini index drift down (σ-convergence):

print(res.interpret())
Across 142 units over 12 periods, the cross-sectional standard deviation of **lifeExp** narrowed (from 12.2 to 12.1). The log-dispersion trend is -0.000631 per period — about 0.0631% less dispersion each period (not statistically significant at conventional levels) — the pattern of **σ-convergence**.
The Gini index also narrowed over the same span (trend -0.00681 per period).

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

The flagship kuznets panel makes the opposite case: the cross-country dispersion of regional inequality has widened, a clean example of σ-divergence (a positive trend):

from expdpy.data import load_kuznets

kz = ex.analyze_sigma_convergence(load_kuznets(), "gini_regional", entity="country", time="year")
print(kz.interpret())
Across 80 units over 11 periods, the cross-sectional standard deviation of **gini_regional** widened (from 0.0875 to 0.106). The log-dispersion trend is 0.0231 per period — about 2.34% more dispersion each period (statistically significant at the 1% level) — the pattern of **σ-divergence** rather than convergence.
The Gini index also widened over the same span (trend 0.0155 per period).

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Note on scale: with no transform, the standard deviation is in the variable’s own units and tracks its level, so the Gini and the coefficient of variation (both scale-free) often tell a cleaner story — compare all three in summary. Raw GDP-per-capita levels diverge in dollars even when log GDP converges, which is why the income case is usually run on log.

See also

  • ex.learn_sigma_convergence() — a runnable Learn sandbox that recovers a known dispersion rate on a geometric-narrowing panel.
  • analyze_beta_convergence — the growth-vs-initial-level view.
  • ex.explain("sigma_convergence") — the concept explainer (also res.explain()).
ex.explain("sigma_convergence")

Sigma convergence

What it is. σ-convergence asks whether the cross-sectional dispersion of a variable shrinks over time — whether units become more alike. At each period the dispersion is measured across units (the standard deviation, the Gini index, the coefficient of variation), and the test regresses the log dispersion on time: a negative slope means dispersion falls by a roughly constant proportion each period, the hallmark of σ-convergence. It is the distributional complement to β-convergence: β-convergence (poorer units growing faster) is necessary but not sufficient for σ-convergence, because new shocks can re-spread the distribution even while laggards catch up (Quah’s critique).

When to use it. Use it to describe whether a cross-section is compressing or fanning out over time — income or productivity across regions, test scores across schools, health across countries. Pair it with β-convergence: β answers ‘do laggards grow faster?’ while σ answers ‘is the whole distribution narrowing?’. Report several dispersion measures, since the standard deviation is scale-dependent while the Gini and the coefficient of variation are scale-free.

Watch out for. - σ-convergence is a descriptive statement about the distribution, not a causal mechanism; a narrowing spread does not say why units converged. - The standard deviation is in the variable’s own units and grows with its level; the Gini and the coefficient of variation are scale-free and often tell a clearer story — compare them. - Dispersion is only comparable across periods when the set of units is fixed, so a balanced panel is required; a changing composition can masquerade as convergence or divergence. - The Gini index is only defined for non-negative values, and the coefficient of variation is unstable when the mean is near zero.

See also: beta_convergence, fwl, correlation_vs_causation

References: Barro & Sala-i-Martin, Economic Growth (2nd ed.), ch. 11; Sala-i-Martin (1996), ‘The Classical Approach to Convergence Analysis’, EJ; Quah (1993), ‘Galton’s Fallacy and Tests of the Convergence Hypothesis’, SJE