← What's New

Tier 0.2 — Held-out predictive log-likelihood diagnostic

2026-05-28 · nstat.extras.em.dynamax_bridge · branch feat/extras-em-predictive-loglik

Why

The EM trainers reported a surrogate log-likelihood — the Gaussian-smoother likelihood of the IRLS pseudo-observations, which are re-linearized every iteration. It changes basis each step, is not monotonic, and is not comparable across fits. Users had no trustworthy way to check convergence, compare models, or score held-out data.

What shipped

A causal forward filter records the one-step-ahead predictive state; the Poisson channel is scored by Gauss-Hermite quadrature of the true Poisson likelihood (integrating over latent uncertainty, not plugging in the mean rate), the Gaussian channel by its exact multivariate-normal predictive density. It is pure NumPy — no dynamax / JAX, so it runs in the base test suite and lets anyone score held-out data without the 200 MB JAX stack.

from nstat.extras.em.dynamax_bridge import (
    fit_point_process_em, point_process_predictive_ll,
)
fit = fit_point_process_em(y_train, state_dim=3, n_iter=30, seed=0)
score = point_process_predictive_ll(
    y_test, fit.transition_matrix, fit.observation_matrix,
    fit.transition_covariance, fit.initial_state_mean,
    fit.initial_state_covariance,
)
score.total          # higher = better; compare seeds / state_dims
score.per_timestep   # locate where a fit predicts poorly

Validation

CheckResult
Gaussian channel vs independent exact-Kalman predictive LLmatch to 1e-6
Model selection: true params vs flat / homogeneous / collapsedtrue ranks highest
Gauss-Hermite convergence (5 vs 30 nodes)< 1 nat
Additivity: total = poisson + gaussian = Σ per_timestepexact
Runs without dynamaxyes (base CI)
Finding the diagnostic exposed. Under weak observability (few neurons / small loadings), PP_EM converges to degenerate dynamics (A → 0, inflated C/Q) whose held-out predictive LL is worse than a constant-rate model. With strong observability A is recovered and the held-out LL improves over initialization. Always check *_predictive_ll on held-out data; hardening PP_EM here is now Tier 0.3 on the roadmap.

Files changed