Tier 0.2's predictive-LL diagnostic exposed a real PP_EM limitation:
under weak observability (few neurons / small loadings) a
single-seed fit can collapse to a degenerate solution
(A → 0, inflated C/Q) whose held-out
predictive log-likelihood is worse than a constant-rate model.
With strong observability the fit recovers, but the user can't tell which
regime they're in without checking the diagnostic — and even then a single
fit may land in a poor local optimum.
fit_point_process_em_best_of(observations, state_dim, *, n_restarts=8, holdout_fraction=0.2, ...) → MultiRestartResultfit_hybrid_em_best_of(poisson_observations, gaussian_observations, state_dim, *, ...) → MultiRestartResultBoth compose the Tier 0.1 (canonical gauge) + Tier 0.2 (true held-out predictive log-likelihood) building blocks:
1 - holdout_fraction) + test (trailing
holdout_fraction) by time.n_restarts different seeds on
the train segment.*_predictive_ll.from nstat.extras.em.dynamax_bridge import fit_point_process_em_best_of
result = fit_point_process_em_best_of(spike_counts, state_dim=3, n_restarts=8)
result.best_result # PointProcessEMResult — gauge-pinned + locally best
result.best_predictive_ll # held-out LL of the chosen fit
result.all_predictive_lls # full diagnostic trace across seeds
Three factors made multi-restart the highest-value-per-line option:
| Check | Result |
|---|---|
| End-to-end smoke (decoder + classifier branches) | PASS |
best_seed / best_predictive_ll consistent with argmax(all_predictive_lls) | PASS |
best_predictive_ll ≥ median(all_predictive_lls) | PASS (by construction) |
Input validation (n_restarts < 1, holdout_fraction out of range, mismatched hybrid lengths, train segment too short) | PASS |
| Hybrid smoke + shape contract | PASS |
tests/extras/test_dynamax_bridge.py: 28 passed
in the dynamax venv (was 24; +4 new multi-restart tests).
The deeper M-step regularization options surfaced in the original
Tier 0.3 plan — data-driven init from log-empirical-rate, and a ridge
on the A/Q M-step — are not shipped
here. Multi-restart selection on the diagnostic was the
highest-value-per-line change and is what the 0.2 finding most directly
called for. Both regularization options can be added incrementally if
specific fixtures still need them.
nstat/extras/em/dynamax_bridge.py — MultiRestartResult,
fit_point_process_em_best_of, fit_hybrid_em_best_of,
_train_test_split_by_time helper; updated module docstring;
extended __all__.tests/extras/test_dynamax_bridge.py — 4 new tests.docs/extras/em_dynamax.md — API table + result-dataclass
table + scope row updated; observability caveat now points to the
*_best_of workflow as the recommended path.parity/methods_roadmap.md — Tier 0.3 marked SHIPPED.