Concepts and Architecture

This page explains how Thalamus is put together: the node pipeline, the data model that flows through it, the capture-file format, and the tooling that turns recordings into analysis-ready data. It complements the Quick Start (which walks through using the program) and the Node Reference (which documents each node type).

Mental model

Think of a Thalamus session as a directed graph of nodes on one shared nanosecond timeline. Producers generate data; consumers and transformers subscribe to the producers whose data they need; and a STORAGE2 node writes every message from the nodes it is subscribed to into the .tha capture log.

A Thalamus pipeline: generators feed transformers feed consumers, STORAGE2 writes the .tha capture, and a controller starts/stops the nodes.

The node pipeline

A Thalamus session is a pipeline of nodes. Each node is a small, independently configurable unit, and every node plays one of four roles:

  • Generators produce data – hardware acquisition (NIDAQ, INTAN, SPIKEGLX, GENICAM, …) or software sources (WAVE, WALLCLOCK, PUPIL).

  • Consumers terminate data – recording it (STORAGE2), logging it (LOG), or driving an output device (NIDAQ_OUT, OPHANIM).

  • Transformers consume data and produce new data – analysis and mapping (OCULOMATIC, ALGEBRA, LUA, NORMALIZE, ARUCO, …).

  • Controllers coordinate the pipeline – starting and stopping groups of nodes (RUNNER, RUNNER2, TASK_CONTROLLER).

You build an experiment by adding nodes, setting each node’s type and properties, and subscribing consumers/transformers to the producers whose data they need. Most nodes have a small set of inline properties; nodes with richer configuration also provide a node widget that appears when the node is selected. See the Node Catalog for the full list of types.

Because nodes are decoupled and communicate over gRPC, a pipeline can also span machines: a REMOTE node proxies another instance’s stream, and a RUNNER2 node can start/stop nodes on remote instances.

The data model

All data is carried in a single message type, StorageRecord. Every record has a node (the producing node’s name), a time (see below), and exactly one body that determines the kind of data:

  • analog – time-series samples. An AnalogResponse holds a flat data array, a list of spans that name each channel and mark its slice of data (begin/end), and a sample_intervals array giving each channel’s sample period in nanoseconds. Integer-valued streams use int_data (32-bit signed) or ulong_data (64-bit unsigned, for counters/timestamps and other large integer values); thalamus.record_reader2 and thalamus.dataframe read all three.

  • image – a video/camera frame, with raw bytes, width, height, pixel format (e.g. Gray, RGB, MPEG4), and frame_interval.

  • text – a string message (log lines, event markers).

  • xsens – motion-capture pose data: body segments with position and quaternion rotation.

  • metadata – key/value pairs (string, integer, or decimal).

  • compressed – a compressed payload wrapping one of the above (used when analog or video compression is enabled).

This uniform model is why the same tools work across modalities, and why a single recording can interleave signals, video, markers, and motion on one timeline.

Time

Record timestamps are expressed in nanoseconds from a steady clock, not a Unix epoch. The clock is monotonic relative to an arbitrary start point, which makes it ideal for measuring intervals and latencies but means timestamps are not wall-clock dates. To anchor a recording to absolute time, include a WALLCLOCK node. For an analog record, the time marks the moment of the record’s last sample, so the time of every sample can be reconstructed from the sample interval.

The capture-file format

A recording is a .tha capture file: a flat sequence of records, each written as an 8-byte big-endian length prefix followed by the serialized StorageRecord protobuf. A STORAGE2 node writes one record every time a node it is subscribed to produces data, so the file is an interleaved, append-only log.

By convention the first record carries a metadata body with the recording number (the Rec key), and a companion <file>.YYYYMMDD.R.json snapshot of the configuration is written alongside the capture. See the Quick Start for an annotated example of the records in a file.

Reading and converting recordings

Several bundled modules turn a .tha file into analysis-ready data:

  • python -m thalamus.record_reader2 FILE – iterate over and print raw records. In Python, thalamus.record_reader2.SimpleRecordReader yields StorageRecord messages (use it as a context manager).

  • python -m thalamus.dataframe -n NODE -i FILE – export one node’s analog (or text) channels to CSV, Parquet, and other tabular formats.

  • python -m thalamus.hydrate FILE – convert an entire capture into a single HDF5 file. For each analog channel it writes a data dataset of samples and a received dataset of per-record timing, from which exact sample times can be reconstructed.

The Examples page shows each of these end to end, and the examples/ folder in the repository contains runnable scripts.

Persistence

Node configurations and window layouts can be saved and reloaded, so an experiment’s full setup is reproducible. When a node’s configuration must change in a backward-incompatible way, a new numbered node type is introduced (for example STORAGE → STORAGE2) so existing setups keep working.