Tier 0.1 — Full PLDS identifiability (canonical gauge)

2026-05-28 · nstat.extras.em.dynamax_bridge · PR #113

The problem

The PP_EM (fit_point_process_em) and mPPCO_EM (fit_hybrid_em) trainers fit a Poisson linear dynamical system, which is gauge-free up to the full GL(d) group: the reparameterization (A, C, x) → (TAT⁻¹, CT⁻¹, Tx) leaves the observable log-rate Cx — and hence the likelihood — invariant. Only the d scale degrees of freedom were pinned, so A/C drifted across random seeds.

A previous attempt to canonicalize every iteration destabilized the EM (it reshapes the optimization landscape each step and fights the Newton trust-region) — producing NaN and across-seed |ΔC| ≈ 460 on identical data.

The fix

In-loop: keep only a cheap diagonal scale pin — enough to keep |C| finite without disturbing the rotation the optimizer is settling into.
Once after convergence: apply the standard LDS canonical form using fresh posteriors under the final parameters — whiten the latent (M⁻½), SVD-rotate the stacked emission matrix (orthogonal, descending columns), and sign-fix. For the hybrid the rotation is computed from the stacked [C_p; C_g] so both channels share one frame.

Result

Quantity	Before	After
Across-seed `\|ΔC\|` (PP_EM)	~460 (+ NaN)	~0.75
Across-seed `\|ΔC\|` (hybrid)	large	~0.15
Returned `CᵀC`	arbitrary	diag(S²), machine-precision
Hybrid Gaussian `R`	~0.09	~0.09 (unchanged, true 0.09)

What remains is local-optima multiplicity (genuinely different likelihoods across seeds), not gauge freedom — addressed by multi-restart selection (Tier 0.3).

Files changed

nstat/extras/em/dynamax_bridge.py — _canonicalize_gauge; in-loop scale pin + single post-loop canonicalization in both trainers.
tests/extras/test_dynamax_bridge.py — two canonical-invariant tests (orthogonality, descending singular values, sign convention, bounded |C|).
docs/extras/em_dynamax.md, parity/methods_roadmap.md — caveats + status.