`nstat.extras.validation.statsmodels_bridge` — statsmodels GLM cross-validation

Cross-validate nstat.fit_poisson_glm against statsmodels.genmod.GLM — the third independent Poisson-GLM oracle in nstat.extras.validation (alongside NeMoS and nstat’s own IRLS).

Because both nstat and statsmodels use IRLS, they should agree to near machine precision (~1e-9 on well-conditioned synthetic fixtures). This is the tightest cross-validation oracle available in the extras namespace — much tighter than NeMoS (~5e-3, different optimizer) or pykalman (~1e-2, different paradigm).

Install

pip install nstat-toolbox[test-parity]   # bundles statsmodels + nemos + pykalman + nitime

statsmodels is already in most SciPy installations — install footprint is trivial.

API

Symbol	Notes
`cross_validate_poisson_glm(X, y, *, include_intercept=True)`	Fits both, returns `StatsmodelsGLMComparison`
`StatsmodelsGLMComparison` (dataclass)	`nstat_coef`, `statsmodels_coef`, `coef_inf_norm`, `coef_rel_inf_norm`
`StatsmodelsGLMComparison.assert_agree(atol=1e-3, rtol=1e-3)`	Regression-guard hook for parity tests

Recipe

import numpy as np
from nstat.extras.validation.statsmodels_bridge import cross_validate_poisson_glm

rng = np.random.default_rng(0)
X = rng.standard_normal((1000, 3))
beta_true = np.array([0.2, -0.4, 0.1])
y = rng.poisson(np.exp(0.5 + X @ beta_true))

cmp = cross_validate_poisson_glm(X, y)
print(f"|Δβ|_∞: {cmp.coef_inf_norm:.3e}")   # expect ~1e-9

# Tight regression guard — both use IRLS, so machine-precision agreement.
cmp.assert_agree(atol=1e-6, rtol=1e-6)

Triangulation pattern

The three GLM oracles together form the strongest cross-validation matrix for nstat.fit_poisson_glm:

Oracle	Algorithm	Expected agreement
statsmodels	IRLS (same as nstat)	~1e-9 (machine precision)
NeMoS	optax first-order (independent)	~5e-3
MATLAB gold fixtures	MATLAB’s `glmfit` (IRLS)	exact (by design)

A regression that loosens the statsmodels agreement beyond ~1e-6 likely indicates a real bug in nstat’s IRLS path — much more sensitive signal than the NeMoS bridge.

Gotchas

Intercept layout. include_intercept=True (default) prepends an intercept column for statsmodels via sm.add_constant(X), and returns intercept-first coefficient vectors from both libraries.
Tolerance philosophy. The default atol=1e-3 is intentionally loose enough to absorb real-data conditioning issues. For synthetic well-conditioned fixtures, tighten to 1e-6 to surface meaningful deviations.

End-to-end demo

examples/extras/validation_statsmodels_demo.py runs the full fit-and-compare on a 1000×3 Poisson fixture.

Upstream references

statsmodels: https://www.statsmodels.org
License: BSD-3-Clause (GPL-2 compatible)
Algorithm: IRLS via statsmodels.genmod.families.Poisson() with the canonical log link

nstat.extras.validation.statsmodels_bridge — statsmodels GLM cross-validation