nstat.extras.validation.statsmodels_bridge — statsmodels GLM cross-validation
Cross-validate nstat.fit_poisson_glm against
statsmodels.genmod.GLM — the third
independent Poisson-GLM oracle in nstat.extras.validation (alongside
NeMoS and nstat’s own IRLS).
Because both nstat and statsmodels use IRLS, they should agree to near machine precision (~1e-9 on well-conditioned synthetic fixtures). This is the tightest cross-validation oracle available in the extras namespace — much tighter than NeMoS (~5e-3, different optimizer) or pykalman (~1e-2, different paradigm).
Install
pip install nstat-toolbox[test-parity] # bundles statsmodels + nemos + pykalman + nitime
statsmodels is already in most SciPy installations — install footprint is trivial.
API
Symbol |
Notes |
|---|---|
|
Fits both, returns |
|
|
|
Regression-guard hook for parity tests |
Recipe
import numpy as np
from nstat.extras.validation.statsmodels_bridge import cross_validate_poisson_glm
rng = np.random.default_rng(0)
X = rng.standard_normal((1000, 3))
beta_true = np.array([0.2, -0.4, 0.1])
y = rng.poisson(np.exp(0.5 + X @ beta_true))
cmp = cross_validate_poisson_glm(X, y)
print(f"|Δβ|_∞: {cmp.coef_inf_norm:.3e}") # expect ~1e-9
# Tight regression guard — both use IRLS, so machine-precision agreement.
cmp.assert_agree(atol=1e-6, rtol=1e-6)
Triangulation pattern
The three GLM oracles together form the strongest cross-validation
matrix for nstat.fit_poisson_glm:
Oracle |
Algorithm |
Expected agreement |
|---|---|---|
statsmodels |
IRLS (same as nstat) |
~1e-9 (machine precision) |
NeMoS |
optax first-order (independent) |
~5e-3 |
MATLAB gold fixtures |
MATLAB’s |
exact (by design) |
A regression that loosens the statsmodels agreement beyond ~1e-6 likely indicates a real bug in nstat’s IRLS path — much more sensitive signal than the NeMoS bridge.
Gotchas
Intercept layout.
include_intercept=True(default) prepends an intercept column for statsmodels viasm.add_constant(X), and returns intercept-first coefficient vectors from both libraries.Tolerance philosophy. The default
atol=1e-3is intentionally loose enough to absorb real-data conditioning issues. For synthetic well-conditioned fixtures, tighten to1e-6to surface meaningful deviations.
End-to-end demo
examples/extras/validation_statsmodels_demo.py
runs the full fit-and-compare on a 1000×3 Poisson fixture.
Upstream references
statsmodels: https://www.statsmodels.org
License: BSD-3-Clause (GPL-2 compatible)
Algorithm: IRLS via
statsmodels.genmod.families.Poisson()with the canonical log link