Common pitfalls & FAQ
Goal of this page. The practical wisdom that ties the other Concepts pages together — the mistakes that quietly invalidate an analysis, and how to avoid them. Skim it before your first real analysis, and return when a result looks wrong.
Glossary jumps: KS test / plot · time-rescaling theorem · population time-rescaling · AIC / BIC · periodogram · multitaper · time–bandwidth product
NW· history / refractory · SSGLM · clusterless decoding · spike sorting
Modeling and binning
How do I choose the bin width?
Two competing pressures. For the time-rescaling KS test
(goodness-of-fit page) the bins should be
fine enough that each holds at most one spike — otherwise the rescaled
intervals are quantized and the test rejects good models. At typical cortical
rates, 1 ms bins are safe. For GLM fitting alone, coarser bins (5–10 ms)
are fine and faster, as long as you keep the log(bin_width) offset so the
coefficients stay rates. When in doubt, bin at 1 ms.
My KS plot fails right near 0 — why?
That region reflects the shortest inter-spike intervals, which are governed
by the refractory period and short-timescale history. A failure there almost
always means the model has no (or too little) spike-history. Add history
terms (history_window_times in TrialConfig); see the
GLM page.
My KS curve bows smoothly away from the diagonal.
A smooth bow (not a near-0 spike) usually means the overall rate is
mis-scaled — often a units bug. Check that sampleRate and bin widths are
consistent (the original MATLAB toolbox had a sampleRate-vs-1/sampleRate
bug here; nSTAT-python fixes it).
Goodness-of-fit and model comparison
A model has the lowest AIC — am I done?
No. AIC/BIC only rank models relative to each other; the winner can still be
absolutely wrong. Always confirm with goodness-of-fit
(FitResult.computeKSStats). Lowest AIC and passes KS → trust it.
Every neuron passes its KS test, but the population model seems off.
Per-neuron KS checks each neuron’s marginal intensity in isolation; it is
blind to coupling. A pair of synchronous neurons modeled as independent
passes per-neuron but fails jointly. Use
population_time_rescale (the Tao et al. 2018
marked test) for population models.
I tested 200 neurons and ~10 “significantly” fail at p<0.05. That is the expected false-positive rate under the null. With many neurons, correct for multiple comparisons (e.g. Benjamini–Hochberg FDR) before declaring misfit.
My tuning estimate looks unstable across the session.
The plain GLM assumes fixed tuning. If it genuinely drifts (learning,
adaptation), that is a modeling choice, not noise — move to the state-space
GLM (nstColl.ssglm/ssglmFB); see the
state-space page.
Spectral analysis (LFP/EEG)
My spectrum is noisy and spiky.
You are likely looking at a raw periodogram, whose variance does not shrink
with more data. Use the multitaper estimate (SignalObj.MTMspectrum); see
the LFP page.
Two nearby peaks blur into one (or a real peak splits).
That is the time–bandwidth (NW) trade-off. Large NW over-smooths and
merges close peaks; too-small NW is noisy. Pick NW for the question —
small to separate close rhythms, larger for broadband power.
I see a sharp line at 60 Hz (or 50 Hz). That is mains/line noise, not brain activity. Notch-filter or exclude that band before interpreting gamma.
Decoding and state-space EM
My decode from one neuron is terrible. Expected — a single neuron is ambiguous. Decoding is a population operation; RMSE drops sharply as you add cells with diverse tuning (the decoding tutorial shows this directly).
My EM fit collapsed (transition matrix → 0, or wildly different per run).
Point-process state-space EM has a weak-observability failure mode and only
finds a local optimum. Use the multi-restart workflow with held-out
predictive log-likelihood (fit_point_process_em_best_of), and the
init="log_empirical_rate" / ridge_lambda options; see the
state-space page and the
EM extras guide.
Reproducibility and random seeds
My results change every time I re-run a simulation. Is that a bug?
No — every example that calls simulate_cif_from_stimulus,
simulate_point_process, or any other simulator draws fresh spikes from a
stochastic point process. Without an explicit seed each run is genuinely
different. To pin a run, pass a seeded generator:
import numpy as np
rng = np.random.default_rng(seed=42)
spikes, _, _ = simulate_cif_from_stimulus(time=t, stimulus=stim,
beta0=-2.0, beta1=1.0, rng=rng)
Use np.random.default_rng(seed) (not legacy np.random.rand,
np.random.seed, or RandomState) — it is the supported pattern across
nSTAT, gives reproducible streams, and is the only style that interacts
correctly with multi-process workers. Every paper-example script and every
examples/tutorials/ script accepts (or hard-codes) a seed so figures
regenerate bit-exactly.
Should I seed the fit itself?
GLM fitting is deterministic given fixed data (the log-likelihood is concave;
Paninski 2004). EM-trained
state-space models are not: each restart finds a different local optimum, so
fit_point_process_em_best_of takes the best of n_restarts seeds — see the
state-space page.
Data and provenance
Where do spike times come from? Does nSTAT sort spikes? No. nSTAT consumes already-sorted spike trains. Detection and sorting are a separate pipeline (e.g. SpikeInterface); bring the results in via the interop bridges. See the microelectrode page.
Does spike-sorting error affect my results?
Yes — misassigned spikes bias encoding and decoding
(Harris et al. 2000). If sorting
is unreliable, consider clusterless decoding, which skips sorting entirely
(nstat.extras.decoding.clusterless_bridge).
See also
Back to the Concepts overview.