analyze_convergence_clubs

Run this page interactively in Google Colab — no install required:

This page is two things at once: an extended user guide for analyze_convergence_clubs — what it does, every argument, and everything it returns — and a testing environment that generates synthetic data with a known club structure and checks that the function recovers it. If a cell’s assert ever fails, the function is broken.

What is club convergence?

β- and σ-convergence assume the panel is one group heading toward one path. Often it is not: some economies catch up to a high path while others settle on a lower one. Club convergence (Phillips & Sul, 2007/2009) tests for exactly that — multiple convergence equilibria — without grouping units in advance.

Each unit is modelled as a common trend \(\mu_t\) scaled by a time-varying, unit-specific loading, \(X_{it} = \delta_{it}\,\mu_t\). Removing the common trend gives the relative transition path

\[ h_{it} = \frac{X_{it}}{\frac1N\sum_i X_{it}}, \]

whose cross-sectional mean is 1 by construction. If the units converge, the cross-sectional variance \(H_t = \frac1N\sum_i (h_{it}-1)^2 \to 0\), and the log(t) regression

\[ \log\!\left(\frac{H_1}{H_t}\right) - 2\log(\log t) = a + b\,\log t + \varepsilon_t, \qquad t = [rT], \dots, T, \]

has a non-negative slope \(b = 2\alpha\). A one-sided \(t_b > -1.65\) fails to reject convergence. When the whole panel rejects, a data-driven clustering algorithm sorts units by their final level, forms a core group by maximising \(t_b\), sieves in the units that keep the group converging, recurses on the residual, and finally merges adjacent clubs that jointly converge. The series is first smoothed with the Hodrick–Prescott filter (\(\lambda=400\) for annual data) so the test runs on the long-run trend rather than the business cycle.

This is a faithful port of the Stata psecta package (Du, 2017); the log(t) statistic uses the Phillips–Sul scalar long-run-variance HAC (Andrews 1991 quadratic-spectral kernel, AR(1) automatic bandwidth), reproduced in NumPy. The variable is used as you supply it — pass log GDP per capita / log labor productivity.

import numpy as np
import pandas as pd

import expdpy as ex

1. The method in one cell

analyze_convergence_clubs(df, var, ...) needs only the variable and the panel ids. Here is the clustering of (log) GDP per capita across countries in the bundled productivity panel — a balanced 108-country, 25-year Penn World Table extract:

from expdpy.data import load_productivity

prod = load_productivity()
res = ex.analyze_convergence_clubs(prod, "log_gdppc", entity="country", time="year")
res.fig

Each line is a club’s average relative transition path; the dashed line at 1 is the cross-sectional mean. res.interpret() reads the result in plain language:

print(res.interpret())

Across 108 units over 25 periods, the Phillips-Sul log(t) test for **log_gdppc** rejects global convergence (t = -17.7 <= -1.65). The clustering algorithm splits the panel into **4 convergence clubs** — groups that each converge internally but not with one another: Club 1 (80), Club 2 (16), Club 3 (6), Club 4 (5).
1 unit do not join any club (the divergent group).
Club 1 collects the highest-ranked units; within each club the log(t) slope b = 2*alpha is positive enough that its t-statistic clears -1.65.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

2. How the function works

Arguments

argument	what it does	when to change it
`var`	the panel variable analysed (used as-is, no log)	pass `log` GDP per capita / log labor productivity
`entity`, `time`	the panel ids	omit if declared once via `set_panel`
`filter`	`"hp"` (default) HP-filters each unit and analyses the trend; `None` uses the variable as given	`None` if you pass an already-detrended series
`hp_lambda`	HP smoothing parameter	`400` for annual data (the literature default)
`r`	log(t) trimming fraction — the first `round(r*T)` periods are dropped	`0.3` for moderate `T`, `0.2` for large `T`
`method`	within-club sieve: `"adjust"` (Schnurbus 2016) or `"ps"` (original Phillips–Sul)	`"ps"` to reproduce the original 2007 rule
`merge`	adjacent-club merging: `"iterative"` / `"single"` / `"none"`	`"none"` to inspect the raw clusters
`fr`	sort by the last period (`0`) or the mean of the last `(1-fr)` periods	`fr>0` when endpoints are noisy

Everything it returns

print("clubs    :", res.n_clubs, "| divergent:", res.n_divergent,
      "| global t:", round(res.global_tstat, 2), "| converged:", res.converged)
print("figures  : fig, fig_paths, fig_clubs")
print("frames   : df", list(res.df.columns), "| summary", list(res.summary.columns))
res.glance()

clubs    : 4 | divergent: 1 | global t: -17.7 | converged: False
figures  : fig, fig_paths, fig_clubs
frames   : df ['country', 'year', 'value', 'relative', 'club'] | summary ['club', 'n_members', 'beta', 'tstat', 'converging', 'members']

	var	n_units	n_periods	n_clubs	n_divergent	global_beta	global_tstat	converged
0	log_gdppc	108	25	4	1	-0.555607	-17.704654	False

The classification table gt (and the summary frame behind it) lists each club’s size, its log(t) slope b and t-statistic, and its members:

res.gt

	N	log(t) b	t-stat	Converges	Members
Convergence clubs: log_gdppc
Phillips-Sul log(t) clustering over 25 periods, 108 units
Club 1	80	-0.040	-0.685	yes	Albania, Algeria, Argentina, Armenia, Australia, Austria, Belgium, Bolivia, ... (+72)
Club 2	16	-0.007	-0.148	yes	Bangladesh, Cambodia, Cameroon, Cote d'Ivoire, Ghana, Honduras, Kenya, Kyrgyz Republic, ... (+8)
Club 3	6	0.225	5.136	yes	Benin, Haiti, Mali, Mozambique, Senegal, Sierra Leone
Club 4	5	0.034	0.146	yes	Burundi, Central African Republic, Democratic Republic of Congo, Malawi, Niger
Divergent	1	—	—	—	Togo
Each club's log(t) t-stat exceeds -1.65 (the convergence threshold); b = 2*alpha is the within-club convergence speed. The Divergent group does not form a convergence club.

membership is the tidy entity -> club appendix list, and fig_paths / fig_clubs show every unit’s path coloured by club and a per-club small-multiples panel:

res.fig_clubs

3. Does it recover the truth?

The cleanest test plants a known club structure: every unit in club \(k\) sits at a distinct long-run level \(\mu_k\) plus an idiosyncratic deviation that decays geometrically, so units within a club converge to a common path while the distinct levels keep the panel from converging globally. The algorithm should reject global convergence and recover the planted partition.

def club_panel(*, n_per_club=12, levels=(10.0, 9.3, 8.6), n_years=35, rho=0.9,
               spread=0.4, noise=0.002, seed=0):
    """Panel with planted clubs; returns (df, {unit: true_club})."""
    rng = np.random.default_rng(seed)
    rows, truth = [], {}
    for k, mu in enumerate(levels, start=1):
        for j in range(n_per_club):
            uid = f"c{k}u{j:02d}"
            truth[uid] = k
            dev = float(rng.uniform(-spread, spread))
            for t in range(1, n_years + 1):
                rows.append((uid, t, mu + dev * rho ** (t - 1)
                             + float(rng.normal(0, noise))))
    return pd.DataFrame(rows, columns=["unit", "year", "x"]), truth


panel, truth = club_panel(seed=1)
fit = ex.analyze_convergence_clubs(panel, "x", entity="unit", time="year")
print(f"global log(t) t = {fit.global_tstat:.2f}  ->  converged = {fit.converged}")
print(f"detected clubs  = {fit.n_clubs}  (planted 3)")
fit.summary[["club", "n_members", "beta", "tstat", "converging"]]

global log(t) t = -86.23  ->  converged = False
detected clubs  = 3  (planted 3)

	club	n_members	beta	tstat	converging
0	Club 1	12	4.625544	26.708370	True
1	Club 2	12	4.562037	29.063103	True
2	Club 3	12	4.660040	26.539161	True

# Best-match accuracy: each detected club scored by its modal planted club.
detected = dict(zip(fit.membership["entity"], fit.membership["club"], strict=True))
by_det = {}
for uid, det in detected.items():
    by_det.setdefault(int(det), []).append(truth[uid])
correct = sum(
    sum(1 for tc in trues if tc == max(set(trues), key=trues.count))
    for det, trues in by_det.items() if det != 0
)
accuracy = correct / len(truth)

assert fit.converged is False          # distinct levels => no global convergence
assert fit.global_tstat <= -1.65
assert fit.n_clubs == 3                 # the three planted clubs
assert accuracy == 1.0                  # every unit in its true club
print(f"recovered {fit.n_clubs} clubs, {accuracy:.0%} of units correctly placed")

recovered 3 clubs, 100% of units correctly placed

When the panel really is one group — all units converging to a single level — the whole-panel test should fail to reject and return a single club:

one, _ = club_panel(levels=(10.0,), n_per_club=40, seed=2)
solo = ex.analyze_convergence_clubs(one, "x", entity="unit", time="year")
assert solo.converged is True and solo.n_clubs == 1
print(f"single converging group: converged={solo.converged}, clubs={solo.n_clubs}")

single converging group: converged=True, clubs=1

4. Convergence clubs across countries

Back to real data. Across the 108 countries, whole-panel convergence is firmly rejected and the algorithm splits the world into several catch-up clubs plus a divergent tail — the textbook “multiple equilibria” picture:

print(res.interpret())

Across 108 units over 25 periods, the Phillips-Sul log(t) test for **log_gdppc** rejects global convergence (t = -17.7 <= -1.65). The clustering algorithm splits the panel into **4 convergence clubs** — groups that each converge internally but not with one another: Club 1 (80), Club 2 (16), Club 3 (6), Club 4 (5).
1 unit do not join any club (the divergent group).
Club 1 collects the highest-ranked units; within each club the log(t) slope b = 2*alpha is positive enough that its t-statistic clears -1.65.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

# Each reported club clears the convergence threshold.
clubs = res.summary[res.summary["club"] != "Divergent"]
assert bool((clubs["tstat"] > -1.65).all())
res.summary[["club", "n_members", "tstat", "converging"]]

	club	n_members	tstat	converging
0	Club 1	80	-0.684671	True
1	Club 2	16	-0.147519	True
2	Club 3	6	5.135556	True
3	Club 4	5	0.145590	True
4	Divergent	1	NaN	False

The same call works on log labor productivity (log_lp) — the other variable in the panel — and on any balanced panel of your own.

ex.learn_convergence_clubs() — a runnable Learn sandbox that plants a known club structure and shows the algorithm recover it.
ex.explain("convergence_clubs") — the concept explainer (also res.explain()).
analyze_beta_convergence / analyze_sigma_convergence — the single-group convergence views.

ex.explain("convergence_clubs")

Convergence clubs (Phillips-Sul log t)

What it is. Club convergence asks whether a panel forms one converging group, several catch-up clubs, or none. Phillips & Sul (2007) model each unit as X_it = delta_it * mu_t — a common trend mu_t scaled by a time-varying, unit-specific loading delta_it — and remove the common trend with the relative transition path h_it = X_it / mean_i(X_it) (its cross-sectional mean is 1 by construction). If the units converge, the cross-sectional variance H_t = mean_i (h_it - 1)^2 tends to zero, and the log(t) regression log(H_1/H_t) - 2 log(log t) = a + b log t has a non-negative slope b = 2*alpha; a one-sided t_b > -1.65 fails to reject convergence. When the whole panel rejects, a data-driven clustering algorithm sorts units by their final level, forms a core group by maximising t_b, sieves in the remaining units that keep the group converging, recurses on the residual, and finally merges adjacent clubs that jointly converge. The series is usually smoothed first with the Hodrick-Prescott filter (lambda = 400 for annual data) so the test runs on the long-run trend rather than the business cycle.

When to use it. Use it when β- and σ-convergence give a muddy verdict — when the panel is plausibly not one homogeneous group but several. It is the standard tool for ‘multiple equilibria’ / poverty-trap questions: convergence clubs in income, labor productivity, carbon intensity, house prices or health, where a subset of units catches up to a high path while others settle on a lower one. It is data-driven (no ex-ante grouping by region or income) and robust to whether the series is trend- or difference-stationary.

Watch out for. - Club membership is a descriptive clustering of transition paths, not a causal account of why a unit lands in a given club. - Results depend on the trimming fraction r (use 0.3 for moderate T, 0.2 for large T), the HP smoothing parameter, and the sorting/sieve options — report them, and check that nearby clubs are not an artefact of the merge rule. - The log(t) t-statistic is asymptotic; with few periods (small T) the test has low power and clubs can be unstable, so prefer longer panels. - Rejecting whole-panel convergence does not by itself prove distinct clubs exist; the algorithm can also return a single divergent group.

See also: beta_convergence, sigma_convergence, correlation_vs_causation

References: Phillips & Sul (2007), ‘Transition Modeling and Econometric Convergence Tests’, Econometrica 75(6): 1771-1855; Phillips & Sul (2009), ‘Economic Transition and Growth’, JAE 24(7): 1153-1185; Schnurbus, Haupt & Meier (2016), ‘Economic Transition and Growth: A Replication’, JAE; Du (2017), ‘Econometric Convergence Test and Club Clustering Using Stata’, Stata Journal 17(4)