Measurements, modalities and timeseries
Measurements, modalities and timeseries are one of the main concepts used in the codebase. They are the foundation of data preprocessing and preparation, hierarchisation and model selection, thus understanding them in detail is a necessary prequisite to using this project.
Timeseries
In the context of this repository, a timeseries is one continuous timeseries with possibly multiple data modalities, which can be used for creating a training sample. One would create two timeseries if for example there has been a gap in the observations, thus not being continuous. A timeseries is an important concept for the training of any generative sequence model: as the model is expected to generate continuous timeseries, it needs to learn from continuous timeseries.
Modalities
A modality is a way of observing the system in a specific way.
- In case of the workout of a human, a wearable can provide a single continuos timeseries with multiple modalities:
- Steps taken is depending on the width of the time bins either a Bernoulli-type modality or a count modality.
- Heart rate/ECG is a continuous, Gaussian-type modality.
- Skin temperature is another continous, Gaussian-type modality.
- In the case of a neural behavioral task:
- The behavior of the mouse can be a categorical modality (for choices), a continuous Gaussian-type modality (eg. position in a maze) etc.
- The neural reconrdings could be a continuous modality (eg. LFPs) or count data in case of binned spike times.
The examples above also show that each timeseries may contain multiple modalities. Gaps can arise in the timeseries if eg. some artifacts have to be cut out or inter-trial intervals are not of interest for the reconstruction.
Modalities are of importance for the project as the decoder models are specifically designed for the input modalities, making sure that the generated data follows the input modality's distribution.
Measurements
The definition of a measurement connects to the mathematical background of DSR. The observation of the system of interest is done via a so called observation function, mapping the system's true state to the observation: . With this, a measurement is defined as a set of timeseries where for all of them the same measurement function can be assumed.
In some cases this assumption can clearly be made or rejected, but many times it reflects the research question and is a decision of the user. For example:
- In neural recordings done over multiple days (if one cannot guarantee that the electrodes do not shift) units in the spike sorted data may belong to different neurons, thus it clearly uses a different measurement function.
- For EEG data, if the same subject is used and the electrode positions on the head exactly align, the same measurement function can be assumed even for sessions over multiple days.
- For EEG or fMRI data from different subjects, the same assumption may be made, but is less clear. Different people have different head shapes, the electrodes may be misaligned in the case of EEG or the subject may lay slightly differently in the fMRI machine. Here the choice is left to the user.
The consequence of choosing to use multiple measurements
The choice of using multiple measurements is reflected in the data preparation, which guides the model initialization. For each measurements a separate encoder and separate decoders (one per modality) will be trained. This means that using more measurements results in more trainable parameters, so this choice is to be made with caution. The number of parameters may be alleviated by the hierarchisation of encoder and decoders.
Do I need multiple measurements?
As stated above, in many cases this is the user's choice. But as a general guideline, if the same measurement function is an appropriate approximation of the setup and the research question does not call for it, multiple measurements should be avoided.