Tier 0.2 — Held-out predictive log-likelihood diagnostic

2026-05-28 · nstat.extras.em.dynamax_bridge · branch feat/extras-em-predictive-loglik

Why

The EM trainers reported a surrogate log-likelihood — the Gaussian-smoother likelihood of the IRLS pseudo-observations, which are re-linearized every iteration. It changes basis each step, is not monotonic, and is not comparable across fits. Users had no trustworthy way to check convergence, compare models, or score held-out data.

What shipped

point_process_predictive_ll(y, A, C, Q, x0, P0) — true one-step-ahead predictive log-likelihood of a Poisson-LGSSM.
hybrid_predictive_ll(yp, yg, A, C_p, C_g, Q, R, x0, P0) — the hybrid counterpart; total = poisson + gaussian.
PredictiveLogLik result: total, per_timestep, poisson, gaussian.

A causal forward filter records the one-step-ahead predictive state; the Poisson channel is scored by Gauss-Hermite quadrature of the true Poisson likelihood (integrating over latent uncertainty, not plugging in the mean rate), the Gaussian channel by its exact multivariate-normal predictive density. It is pure NumPy — no dynamax / JAX, so it runs in the base test suite and lets anyone score held-out data without the 200 MB JAX stack.

from nstat.extras.em.dynamax_bridge import (
    fit_point_process_em, point_process_predictive_ll,
)
fit = fit_point_process_em(y_train, state_dim=3, n_iter=30, seed=0)
score = point_process_predictive_ll(
    y_test, fit.transition_matrix, fit.observation_matrix,
    fit.transition_covariance, fit.initial_state_mean,
    fit.initial_state_covariance,
)
score.total          # higher = better; compare seeds / state_dims
score.per_timestep   # locate where a fit predicts poorly

Validation

Check	Result
Gaussian channel vs independent exact-Kalman predictive LL	match to 1e-6
Model selection: true params vs flat / homogeneous / collapsed	true ranks highest
Gauss-Hermite convergence (5 vs 30 nodes)	< 1 nat
Additivity: `total = poisson + gaussian = Σ per_timestep`	exact
Runs without dynamax	yes (base CI)

Finding the diagnostic exposed. Under weak observability (few neurons / small loadings), PP_EM converges to degenerate dynamics (A → 0, inflated C/Q) whose held-out predictive LL is worse than a constant-rate model. With strong observability A is recovered and the held-out LL improves over initialization. Always check *_predictive_ll on held-out data; hardening PP_EM here is now Tier 0.3 on the roadmap.

Files changed

nstat/extras/em/dynamax_bridge.py — diagnostic + helpers (Gauss-Hermite, causal forward filter, MVN predictive density); refreshed scope.
tests/extras/test_dynamax_bridge.py — six tests (five dynamax-free).
examples/extras/em_dynamax_demo.py — added a predictive-LL demo.
docs/extras/em_dynamax.md, parity/methods_roadmap.md — usage + observability caveat + status.