Concepts and Architecture ========================= This page explains how Thalamus is put together: the node pipeline, the data model that flows through it, the capture-file format, and the tooling that turns recordings into analysis-ready data. It complements the :doc:`quickstart` (which walks through using the program) and the :doc:`Node Reference ` (which documents each node type). .. admonition:: Mental model Think of a Thalamus session as a **directed graph of nodes on one shared nanosecond timeline**. Producers generate data; consumers and transformers *subscribe* to the producers whose data they need; and a STORAGE2 node writes every message from the nodes it is subscribed to into the ``.tha`` capture log. .. image:: _static/architecture.svg :width: 100% :alt: A Thalamus pipeline: generators feed transformers feed consumers, STORAGE2 writes the .tha capture, and a controller starts/stops the nodes. The node pipeline ----------------- A Thalamus session is a **pipeline of nodes**. Each node is a small, independently configurable unit, and every node plays one of four roles: * **Generators** produce data -- hardware acquisition (NIDAQ, INTAN, SPIKEGLX, GENICAM, ...) or software sources (WAVE, WALLCLOCK, PUPIL). * **Consumers** terminate data -- recording it (STORAGE2), logging it (LOG), or driving an output device (NIDAQ_OUT, OPHANIM). * **Transformers** consume data and produce new data -- analysis and mapping (OCULOMATIC, ALGEBRA, LUA, NORMALIZE, ARUCO, ...). * **Controllers** coordinate the pipeline -- starting and stopping groups of nodes (RUNNER, RUNNER2, TASK_CONTROLLER). You build an experiment by adding nodes, setting each node's type and properties, and **subscribing** consumers/transformers to the producers whose data they need. Most nodes have a small set of inline properties; nodes with richer configuration also provide a *node widget* that appears when the node is selected. See the :doc:`Node Catalog ` for the full list of types. Because nodes are decoupled and communicate over gRPC, a pipeline can also span machines: a :doc:`REMOTE ` node proxies another instance's stream, and a :doc:`RUNNER2 ` node can start/stop nodes on remote instances. The data model -------------- All data is carried in a single message type, ``StorageRecord``. Every record has a ``node`` (the producing node's name), a ``time`` (see below), and exactly one *body* that determines the kind of data: .. _ulong-data: * **analog** -- time-series samples. An ``AnalogResponse`` holds a flat ``data`` array, a list of ``spans`` that name each channel and mark its slice of ``data`` (``begin``/``end``), and a ``sample_intervals`` array giving each channel's sample period in nanoseconds. Integer-valued streams use ``int_data`` (32-bit signed) or ``ulong_data`` (64-bit unsigned, for counters/timestamps and other large integer values); ``thalamus.record_reader2`` and ``thalamus.dataframe`` read all three. * **image** -- a video/camera frame, with raw bytes, ``width``, ``height``, pixel ``format`` (e.g. ``Gray``, ``RGB``, ``MPEG4``), and ``frame_interval``. * **text** -- a string message (log lines, event markers). * **xsens** -- motion-capture pose data: body segments with position and quaternion rotation. * **metadata** -- key/value pairs (string, integer, or decimal). * **compressed** -- a compressed payload wrapping one of the above (used when analog or video compression is enabled). This uniform model is why the same tools work across modalities, and why a single recording can interleave signals, video, markers, and motion on one timeline. Time ---- Record timestamps are expressed in **nanoseconds from a steady clock**, not a Unix epoch. The clock is monotonic relative to an arbitrary start point, which makes it ideal for measuring intervals and latencies but means timestamps are not wall-clock dates. To anchor a recording to absolute time, include a :doc:`WALLCLOCK ` node. For an analog record, the ``time`` marks the moment of the record's last sample, so the time of every sample can be reconstructed from the sample interval. The capture-file format ----------------------- A recording is a ``.tha`` capture file: a flat sequence of records, each written as an 8-byte big-endian length prefix followed by the serialized ``StorageRecord`` protobuf. A STORAGE2 node writes one record every time a node it is subscribed to produces data, so the file is an interleaved, append-only log. By convention the first record carries a ``metadata`` body with the recording number (the ``Rec`` key), and a companion ``.YYYYMMDD.R.json`` snapshot of the configuration is written alongside the capture. See the :doc:`quickstart` for an annotated example of the records in a file. Reading and converting recordings ---------------------------------- Several bundled modules turn a ``.tha`` file into analysis-ready data: * ``python -m thalamus.record_reader2 FILE`` -- iterate over and print raw records. In Python, ``thalamus.record_reader2.SimpleRecordReader`` yields ``StorageRecord`` messages (use it as a context manager). * ``python -m thalamus.dataframe -n NODE -i FILE`` -- export one node's analog (or text) channels to CSV, Parquet, and other tabular formats. * ``python -m thalamus.hydrate FILE`` -- convert an entire capture into a single HDF5 file. For each analog channel it writes a ``data`` dataset of samples and a ``received`` dataset of per-record timing, from which exact sample times can be reconstructed. The :doc:`examples/index` page shows each of these end to end, and the ``examples/`` folder in the repository contains runnable scripts. Persistence ----------- Node configurations and window layouts can be saved and reloaded, so an experiment's full setup is reproducible. When a node's configuration must change in a backward-incompatible way, a new numbered node type is introduced (for example STORAGE → STORAGE2) so existing setups keep working.