# Common pitfalls & FAQ > **Goal of this page.** The practical wisdom that ties the other Concepts > pages together — the mistakes that quietly invalidate an analysis, and how > to avoid them. Skim it before your first real analysis, and return when a > result looks wrong. > > **Glossary jumps:** [KS test / plot](glossary.md#kolmogorov-smirnov-test-ks-plot) · > [time-rescaling theorem](glossary.md#time-rescaling-theorem) · > [population time-rescaling](glossary.md#population-time-rescaling) · > [AIC / BIC](glossary.md#aic-bic) · > [periodogram](glossary.md#periodogram) · > [multitaper](glossary.md#multitaper-method) · > [time–bandwidth product `NW`](glossary.md#time-bandwidth-product) · > [history / refractory](glossary.md#history-term-refractory-period) · > [SSGLM](glossary.md#state-space-glm) · > [clusterless decoding](glossary.md#clusterless-decoding) · > [spike sorting](glossary.md#spike-sorting) ## Modeling and binning **How do I choose the bin width?** Two competing pressures. For the **time-rescaling KS test** ([goodness-of-fit page](goodness_of_fit_and_decoding.md)) the bins should be fine enough that each holds **at most one spike** — otherwise the rescaled intervals are quantized and the test rejects good models. At typical cortical rates, 1 ms bins are safe. For **GLM fitting** alone, coarser bins (5–10 ms) are fine and faster, as long as you keep the `log(bin_width)` offset so the coefficients stay rates. When in doubt, bin at 1 ms. **My KS plot fails right near 0 — why?** That region reflects the **shortest** inter-spike intervals, which are governed by the refractory period and short-timescale history. A failure there almost always means the model has **no (or too little) spike-history**. Add history terms (`history_window_times` in `TrialConfig`); see the [GLM page](spike_trains_and_glms.md). **My KS curve bows smoothly away from the diagonal.** A smooth bow (not a near-0 spike) usually means the overall **rate is mis-scaled** — often a units bug. Check that `sampleRate` and bin widths are consistent (the original MATLAB toolbox had a `sampleRate`-vs-`1/sampleRate` bug here; nSTAT-python fixes it). ## Goodness-of-fit and model comparison **A model has the lowest AIC — am I done?** No. AIC/BIC only rank models *relative to each other*; the winner can still be absolutely wrong. **Always** confirm with goodness-of-fit (`FitResult.computeKSStats`). Lowest AIC **and** passes KS → trust it. **Every neuron passes its KS test, but the population model seems off.** Per-neuron KS checks each neuron's *marginal* intensity in isolation; it is blind to **coupling**. A pair of synchronous neurons modeled as independent passes per-neuron but fails jointly. Use `population_time_rescale` (the [Tao et al. 2018](https://pubmed.ncbi.nlm.nih.gov/30298220/) marked test) for population models. **I tested 200 neurons and ~10 "significantly" fail at p<0.05.** That is the expected false-positive rate under the null. With many neurons, **correct for multiple comparisons** (e.g. Benjamini–Hochberg FDR) before declaring misfit. **My tuning estimate looks unstable across the session.** The plain GLM assumes **fixed** tuning. If it genuinely drifts (learning, adaptation), that is a modeling choice, not noise — move to the state-space GLM (`nstColl.ssglm`/`ssglmFB`); see the [state-space page](state_space_and_em.md). ## Spectral analysis (LFP/EEG) **My spectrum is noisy and spiky.** You are likely looking at a raw **periodogram**, whose variance does not shrink with more data. Use the **multitaper** estimate (`SignalObj.MTMspectrum`); see the [LFP page](lfp_and_spectral.md). **Two nearby peaks blur into one (or a real peak splits).** That is the **time–bandwidth (`NW`) trade-off**. Large `NW` over-smooths and merges close peaks; too-small `NW` is noisy. Pick `NW` for the question — small to separate close rhythms, larger for broadband power. **I see a sharp line at 60 Hz (or 50 Hz).** That is **mains/line noise**, not brain activity. Notch-filter or exclude that band before interpreting gamma. ## Decoding and state-space EM **My decode from one neuron is terrible.** Expected — a single neuron is ambiguous. **Decoding is a population operation;** RMSE drops sharply as you add cells with diverse tuning (the [decoding tutorial](https://github.com/cajigaslab/nSTAT-python/blob/main/examples/tutorials/decoding_ppaf.py) shows this directly). **My EM fit collapsed (transition matrix → 0, or wildly different per run).** Point-process state-space EM has a **weak-observability** failure mode and only finds a *local* optimum. Use the multi-restart workflow with held-out predictive log-likelihood (`fit_point_process_em_best_of`), and the `init="log_empirical_rate"` / `ridge_lambda` options; see the [state-space page](state_space_and_em.md) and the [EM extras guide](../extras/em_dynamax.md). ## Reproducibility and random seeds **My results change every time I re-run a simulation. Is that a bug?** No — every example that calls `simulate_cif_from_stimulus`, `simulate_point_process`, or any other simulator draws fresh spikes from a stochastic point process. *Without* an explicit seed each run is genuinely different. To pin a run, pass a seeded generator: ```python import numpy as np rng = np.random.default_rng(seed=42) spikes, _, _ = simulate_cif_from_stimulus(time=t, stimulus=stim, beta0=-2.0, beta1=1.0, rng=rng) ``` Use `np.random.default_rng(seed)` (not legacy `np.random.rand`, `np.random.seed`, or `RandomState`) — it is the supported pattern across nSTAT, gives reproducible streams, and is the only style that interacts correctly with multi-process workers. Every paper-example script and every `examples/tutorials/` script accepts (or hard-codes) a seed so figures regenerate bit-exactly. **Should I seed the fit itself?** GLM fitting is deterministic given fixed data (the log-likelihood is concave; [Paninski 2004](https://pubmed.ncbi.nlm.nih.gov/15600233/)). EM-trained state-space models are not: each restart finds a different local optimum, so `fit_point_process_em_best_of` takes the best of `n_restarts` seeds — see the [state-space page](state_space_and_em.md). ## Data and provenance **Where do spike times come from? Does nSTAT sort spikes?** No. nSTAT consumes **already-sorted** spike trains. Detection and sorting are a separate pipeline (e.g. [SpikeInterface](https://github.com/SpikeInterface/spikeinterface)); bring the results in via the [interop bridges](../extras.rst). See the [microelectrode page](microelectrode_recordings.md). **Does spike-sorting error affect my results?** Yes — misassigned spikes bias encoding and decoding ([Harris et al. 2000](https://pubmed.ncbi.nlm.nih.gov/10899214/)). If sorting is unreliable, consider **clusterless** decoding, which skips sorting entirely (`nstat.extras.decoding.clusterless_bridge`). ## See also - [Glossary](glossary.md) · [Annotated bibliography](bibliography.md) - Back to the [Concepts overview](index.md).