# `nstat.extras.validation.statsmodels_bridge` — statsmodels GLM cross-validation Cross-validate [`nstat.fit_poisson_glm`](../api.html) against [`statsmodels.genmod.GLM`](https://www.statsmodels.org/) — the third independent Poisson-GLM oracle in `nstat.extras.validation` (alongside NeMoS and nstat's own IRLS). Because **both nstat and statsmodels use IRLS**, they should agree to **near machine precision** (~1e-9 on well-conditioned synthetic fixtures). This is the **tightest** cross-validation oracle available in the extras namespace — much tighter than NeMoS (~5e-3, different optimizer) or pykalman (~1e-2, different paradigm). ## Install ```bash pip install nstat-toolbox[test-parity] # bundles statsmodels + nemos + pykalman + nitime ``` statsmodels is already in most SciPy installations — install footprint is trivial. ## API | Symbol | Notes | |---|---| | `cross_validate_poisson_glm(X, y, *, include_intercept=True)` | Fits both, returns `StatsmodelsGLMComparison` | | `StatsmodelsGLMComparison` (dataclass) | `nstat_coef`, `statsmodels_coef`, `coef_inf_norm`, `coef_rel_inf_norm` | | `StatsmodelsGLMComparison.assert_agree(atol=1e-3, rtol=1e-3)` | Regression-guard hook for parity tests | ## Recipe ```python import numpy as np from nstat.extras.validation.statsmodels_bridge import cross_validate_poisson_glm rng = np.random.default_rng(0) X = rng.standard_normal((1000, 3)) beta_true = np.array([0.2, -0.4, 0.1]) y = rng.poisson(np.exp(0.5 + X @ beta_true)) cmp = cross_validate_poisson_glm(X, y) print(f"|Δβ|_∞: {cmp.coef_inf_norm:.3e}") # expect ~1e-9 # Tight regression guard — both use IRLS, so machine-precision agreement. cmp.assert_agree(atol=1e-6, rtol=1e-6) ``` ## Triangulation pattern The three GLM oracles together form the strongest cross-validation matrix for `nstat.fit_poisson_glm`: | Oracle | Algorithm | Expected agreement | |---|---|---| | **statsmodels** | IRLS (same as nstat) | ~1e-9 (machine precision) | | **NeMoS** | optax first-order (independent) | ~5e-3 | | **MATLAB gold fixtures** | MATLAB's `glmfit` (IRLS) | exact (by design) | A regression that loosens the statsmodels agreement beyond ~1e-6 likely indicates a real bug in nstat's IRLS path — much more sensitive signal than the NeMoS bridge. ## Gotchas - **Intercept layout.** `include_intercept=True` (default) prepends an intercept column for statsmodels via `sm.add_constant(X)`, and returns intercept-first coefficient vectors from both libraries. - **Tolerance philosophy.** The default `atol=1e-3` is intentionally loose enough to absorb real-data conditioning issues. For synthetic well-conditioned fixtures, **tighten to `1e-6`** to surface meaningful deviations. ## End-to-end demo [`examples/extras/validation_statsmodels_demo.py`](https://github.com/cajigaslab/nSTAT-python/blob/main/examples/extras/validation_statsmodels_demo.py) runs the full fit-and-compare on a 1000×3 Poisson fixture. ## Upstream references - statsmodels: https://www.statsmodels.org - License: BSD-3-Clause (GPL-2 compatible) - Algorithm: IRLS via `statsmodels.genmod.families.Poisson()` with the canonical log link