import numpy as np
import pandas as pd
import expdpy as exanalyze_convergence_clubs
Run this page interactively in Google Colab — no install required:
This page is two things at once: an extended user guide for analyze_convergence_clubs — what it does, every argument, and everything it returns — and a testing environment that generates synthetic data with a known club structure and checks that the function recovers it. If a cell’s assert ever fails, the function is broken.
What is club convergence?
β- and σ-convergence assume the panel is one group heading toward one path. Often it is not: some economies catch up to a high path while others settle on a lower one. Club convergence (Phillips & Sul, 2007/2009) tests for exactly that — multiple convergence equilibria — without grouping units in advance.
Each unit is modelled as a common trend \(\mu_t\) scaled by a time-varying, unit-specific loading, \(X_{it} = \delta_{it}\,\mu_t\). Removing the common trend gives the relative transition path
\[ h_{it} = \frac{X_{it}}{\frac1N\sum_i X_{it}}, \]
whose cross-sectional mean is 1 by construction. If the units converge, the cross-sectional variance \(H_t = \frac1N\sum_i (h_{it}-1)^2 \to 0\), and the log(t) regression
\[ \log\!\left(\frac{H_1}{H_t}\right) - 2\log(\log t) = a + b\,\log t + \varepsilon_t, \qquad t = [rT], \dots, T, \]
has a non-negative slope \(b = 2\alpha\). A one-sided \(t_b > -1.65\) fails to reject convergence. When the whole panel rejects, a data-driven clustering algorithm sorts units by their final level, forms a core group by maximising \(t_b\), sieves in the units that keep the group converging, recurses on the residual, and finally merges adjacent clubs that jointly converge. The series is first smoothed with the Hodrick–Prescott filter (\(\lambda=400\) for annual data) so the test runs on the long-run trend rather than the business cycle.
This is a faithful port of the Stata psecta package (Du, 2017); the log(t) statistic uses the Phillips–Sul scalar long-run-variance HAC (Andrews 1991 quadratic-spectral kernel, AR(1) automatic bandwidth), reproduced in NumPy. The variable is used as you supply it — pass log GDP per capita / log labor productivity.
1. The method in one cell
analyze_convergence_clubs(df, var, ...) needs only the variable and the panel ids. Here is the clustering of (log) GDP per capita across countries in the bundled productivity panel — a balanced 108-country, 25-year Penn World Table extract:
from expdpy.data import load_productivity
prod = load_productivity()
res = ex.analyze_convergence_clubs(prod, "log_gdppc", entity="country", time="year")
res.figEach line is a club’s average relative transition path; the dashed line at 1 is the cross-sectional mean. res.interpret() reads the result in plain language:
print(res.interpret())Across 108 units over 25 periods, the Phillips-Sul log(t) test for **log_gdppc** rejects global convergence (t = -17.7 <= -1.65). The clustering algorithm splits the panel into **4 convergence clubs** — groups that each converge internally but not with one another: Club 1 (80), Club 2 (16), Club 3 (6), Club 4 (5).
1 unit do not join any club (the divergent group).
Club 1 collects the highest-ranked units; within each club the log(t) slope b = 2*alpha is positive enough that its t-statistic clears -1.65.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
2. How the function works
Arguments
| argument | what it does | when to change it |
|---|---|---|
var |
the panel variable analysed (used as-is, no log) | pass log GDP per capita / log labor productivity |
entity, time |
the panel ids | omit if declared once via set_panel |
filter |
"hp" (default) HP-filters each unit and analyses the trend; None uses the variable as given |
None if you pass an already-detrended series |
hp_lambda |
HP smoothing parameter | 400 for annual data (the literature default) |
r |
log(t) trimming fraction — the first round(r*T) periods are dropped |
0.3 for moderate T, 0.2 for large T |
method |
within-club sieve: "adjust" (Schnurbus 2016) or "ps" (original Phillips–Sul) |
"ps" to reproduce the original 2007 rule |
merge |
adjacent-club merging: "iterative" / "single" / "none" |
"none" to inspect the raw clusters |
fr |
sort by the last period (0) or the mean of the last (1-fr) periods |
fr>0 when endpoints are noisy |
Everything it returns
print("clubs :", res.n_clubs, "| divergent:", res.n_divergent,
"| global t:", round(res.global_tstat, 2), "| converged:", res.converged)
print("figures : fig, fig_paths, fig_clubs")
print("frames : df", list(res.df.columns), "| summary", list(res.summary.columns))
res.glance()clubs : 4 | divergent: 1 | global t: -17.7 | converged: False
figures : fig, fig_paths, fig_clubs
frames : df ['country', 'year', 'value', 'relative', 'club'] | summary ['club', 'n_members', 'beta', 'tstat', 'converging', 'members']
| var | n_units | n_periods | n_clubs | n_divergent | global_beta | global_tstat | converged | |
|---|---|---|---|---|---|---|---|---|
| 0 | log_gdppc | 108 | 25 | 4 | 1 | -0.555607 | -17.704654 | False |
The classification table gt (and the summary frame behind it) lists each club’s size, its log(t) slope b and t-statistic, and its members:
res.gt| Convergence clubs: log_gdppc | |||||
| Phillips-Sul log(t) clustering over 25 periods, 108 units | |||||
| N | log(t) b | t-stat | Converges | Members | |
|---|---|---|---|---|---|
| Club 1 | 80 | -0.040 | -0.685 | yes | Albania, Algeria, Argentina, Armenia, Australia, Austria, Belgium, Bolivia, ... (+72) |
| Club 2 | 16 | -0.007 | -0.148 | yes | Bangladesh, Cambodia, Cameroon, Cote d'Ivoire, Ghana, Honduras, Kenya, Kyrgyz Republic, ... (+8) |
| Club 3 | 6 | 0.225 | 5.136 | yes | Benin, Haiti, Mali, Mozambique, Senegal, Sierra Leone |
| Club 4 | 5 | 0.034 | 0.146 | yes | Burundi, Central African Republic, Democratic Republic of Congo, Malawi, Niger |
| Divergent | 1 | — | — | — | Togo |
| Each club's log(t) t-stat exceeds -1.65 (the convergence threshold); b = 2*alpha is the within-club convergence speed. The Divergent group does not form a convergence club. | |||||
membership is the tidy entity -> club appendix list, and fig_paths / fig_clubs show every unit’s path coloured by club and a per-club small-multiples panel:
res.fig_clubs3. Does it recover the truth?
The cleanest test plants a known club structure: every unit in club \(k\) sits at a distinct long-run level \(\mu_k\) plus an idiosyncratic deviation that decays geometrically, so units within a club converge to a common path while the distinct levels keep the panel from converging globally. The algorithm should reject global convergence and recover the planted partition.
def club_panel(*, n_per_club=12, levels=(10.0, 9.3, 8.6), n_years=35, rho=0.9,
spread=0.4, noise=0.002, seed=0):
"""Panel with planted clubs; returns (df, {unit: true_club})."""
rng = np.random.default_rng(seed)
rows, truth = [], {}
for k, mu in enumerate(levels, start=1):
for j in range(n_per_club):
uid = f"c{k}u{j:02d}"
truth[uid] = k
dev = float(rng.uniform(-spread, spread))
for t in range(1, n_years + 1):
rows.append((uid, t, mu + dev * rho ** (t - 1)
+ float(rng.normal(0, noise))))
return pd.DataFrame(rows, columns=["unit", "year", "x"]), truth
panel, truth = club_panel(seed=1)
fit = ex.analyze_convergence_clubs(panel, "x", entity="unit", time="year")
print(f"global log(t) t = {fit.global_tstat:.2f} -> converged = {fit.converged}")
print(f"detected clubs = {fit.n_clubs} (planted 3)")
fit.summary[["club", "n_members", "beta", "tstat", "converging"]]global log(t) t = -86.23 -> converged = False
detected clubs = 3 (planted 3)
| club | n_members | beta | tstat | converging | |
|---|---|---|---|---|---|
| 0 | Club 1 | 12 | 4.625544 | 26.708370 | True |
| 1 | Club 2 | 12 | 4.562037 | 29.063103 | True |
| 2 | Club 3 | 12 | 4.660040 | 26.539161 | True |
# Best-match accuracy: each detected club scored by its modal planted club.
detected = dict(zip(fit.membership["entity"], fit.membership["club"], strict=True))
by_det = {}
for uid, det in detected.items():
by_det.setdefault(int(det), []).append(truth[uid])
correct = sum(
sum(1 for tc in trues if tc == max(set(trues), key=trues.count))
for det, trues in by_det.items() if det != 0
)
accuracy = correct / len(truth)
assert fit.converged is False # distinct levels => no global convergence
assert fit.global_tstat <= -1.65
assert fit.n_clubs == 3 # the three planted clubs
assert accuracy == 1.0 # every unit in its true club
print(f"recovered {fit.n_clubs} clubs, {accuracy:.0%} of units correctly placed")recovered 3 clubs, 100% of units correctly placed
When the panel really is one group — all units converging to a single level — the whole-panel test should fail to reject and return a single club:
one, _ = club_panel(levels=(10.0,), n_per_club=40, seed=2)
solo = ex.analyze_convergence_clubs(one, "x", entity="unit", time="year")
assert solo.converged is True and solo.n_clubs == 1
print(f"single converging group: converged={solo.converged}, clubs={solo.n_clubs}")single converging group: converged=True, clubs=1
4. Convergence clubs across countries
Back to real data. Across the 108 countries, whole-panel convergence is firmly rejected and the algorithm splits the world into several catch-up clubs plus a divergent tail — the textbook “multiple equilibria” picture:
print(res.interpret())Across 108 units over 25 periods, the Phillips-Sul log(t) test for **log_gdppc** rejects global convergence (t = -17.7 <= -1.65). The clustering algorithm splits the panel into **4 convergence clubs** — groups that each converge internally but not with one another: Club 1 (80), Club 2 (16), Club 3 (6), Club 4 (5).
1 unit do not join any club (the divergent group).
Club 1 collects the highest-ranked units; within each club the log(t) slope b = 2*alpha is positive enough that its t-statistic clears -1.65.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
# Each reported club clears the convergence threshold.
clubs = res.summary[res.summary["club"] != "Divergent"]
assert bool((clubs["tstat"] > -1.65).all())
res.summary[["club", "n_members", "tstat", "converging"]]| club | n_members | tstat | converging | |
|---|---|---|---|---|
| 0 | Club 1 | 80 | -0.684671 | True |
| 1 | Club 2 | 16 | -0.147519 | True |
| 2 | Club 3 | 6 | 5.135556 | True |
| 3 | Club 4 | 5 | 0.145590 | True |
| 4 | Divergent | 1 | NaN | False |
The same call works on log labor productivity (log_lp) — the other variable in the panel — and on any balanced panel of your own.
See also
ex.learn_convergence_clubs()— a runnable Learn sandbox that plants a known club structure and shows the algorithm recover it.ex.explain("convergence_clubs")— the concept explainer (alsores.explain()).analyze_beta_convergence/analyze_sigma_convergence— the single-group convergence views.
ex.explain("convergence_clubs")Convergence clubs (Phillips-Sul log t)
What it is. Club convergence asks whether a panel forms one converging group, several catch-up clubs, or none. Phillips & Sul (2007) model each unit as X_it = delta_it * mu_t — a common trend mu_t scaled by a time-varying, unit-specific loading delta_it — and remove the common trend with the relative transition path h_it = X_it / mean_i(X_it) (its cross-sectional mean is 1 by construction). If the units converge, the cross-sectional variance H_t = mean_i (h_it - 1)^2 tends to zero, and the log(t) regression log(H_1/H_t) - 2 log(log t) = a + b log t has a non-negative slope b = 2*alpha; a one-sided t_b > -1.65 fails to reject convergence. When the whole panel rejects, a data-driven clustering algorithm sorts units by their final level, forms a core group by maximising t_b, sieves in the remaining units that keep the group converging, recurses on the residual, and finally merges adjacent clubs that jointly converge. The series is usually smoothed first with the Hodrick-Prescott filter (lambda = 400 for annual data) so the test runs on the long-run trend rather than the business cycle.
When to use it. Use it when β- and σ-convergence give a muddy verdict — when the panel is plausibly not one homogeneous group but several. It is the standard tool for ‘multiple equilibria’ / poverty-trap questions: convergence clubs in income, labor productivity, carbon intensity, house prices or health, where a subset of units catches up to a high path while others settle on a lower one. It is data-driven (no ex-ante grouping by region or income) and robust to whether the series is trend- or difference-stationary.
Watch out for. - Club membership is a descriptive clustering of transition paths, not a causal account of why a unit lands in a given club. - Results depend on the trimming fraction r (use 0.3 for moderate T, 0.2 for large T), the HP smoothing parameter, and the sorting/sieve options — report them, and check that nearby clubs are not an artefact of the merge rule. - The log(t) t-statistic is asymptotic; with few periods (small T) the test has low power and clubs can be unstable, so prefer longer panels. - Rejecting whole-panel convergence does not by itself prove distinct clubs exist; the algorithm can also return a single divergent group.
See also: beta_convergence, sigma_convergence, correlation_vs_causation
References: Phillips & Sul (2007), ‘Transition Modeling and Econometric Convergence Tests’, Econometrica 75(6): 1771-1855; Phillips & Sul (2009), ‘Economic Transition and Growth’, JAE 24(7): 1153-1185; Schnurbus, Haupt & Meier (2016), ‘Economic Transition and Growth: A Replication’, JAE; Du (2017), ‘Econometric Convergence Test and Club Clustering Using Stata’, Stata Journal 17(4)