Evaluators

There are three separate kinds of evaluators. Modality-specific evaluators, modality-independent evaluators and general evaluators.

Modality-specific evaluators

These are evaluators which are connected to a modality (and thus to a decoder). They receive for each timeseries in the evaluation_test_timeseries or evaluation_train_timeseries true, generated and reconstructed trajectories in the observational and latent space. They use these trajectories to return metrics and plots, specific the given modality and timeseries.

Their limitation lies in the fact that they only receive data from a specific modality and timeseries, thus they cannot create metrics based on other data or across modalities.

Modality-specific evaluators are set automatically for the following decoders:

BinaryCategorical or Categorical
Identity
NegativeBinomial or ZeroInflatedNegativeBinomial
Gaussian
NoDecoder

Otherwise one may customize the chosen evaluators using the modality_specific_evaluators key. This expects a dict[str, dict[str, ModalitySpecificEvaluatorOption]], where the outer dict's keys are the measurement identifiers and the inner dict's keys are the modality identifiers. ModalitySpecificEvaluatorOptions are dicts with keys name and parameters. Latter contains modality-specific options as well as the following common options:

name (str): The display name of the evaluator for TensorBoard.
sequence_length (Optional[int], default=None): The sequence length for both plots and metrics for the evaluator. If None, defaults to the whole length of the timeseries.
metric_sequence_length (Optional[int], default=None): The sequence length for metrics for the evaluator. If None, defaults to the whole length of the timeseries. Note that not all evaluators support >1 samples.
metric_n_samples (int, default=1): How many samples should be drawn from the evaluator distribution from the decoder.
plot_sequence_length, (Optional[int], default=None): The sequence length for plots for the evaluator. If None, defaults to the whole length of the timeseries.
plot_use_expected (bool, default=False): Whether plots should use the expected value of the decoder or should sample from the decoder distribution.

name can be one of the listed options.

Categorical

"name": "CategoricalEvaluator"

Default evaluator for the Categorical and BinaryCategorical decoders.

Does not have evaluator-specific options.

Returned figures:

true_vs_observed: The true categories are plotted against the DSR-generated observed categories.
true_vs_ae_reconstructed: The true categories are plotted against the reconstructed categories.

Returned metrics:

Matthews correlation coefficient (mcc): Matthews correlation coefficient between the true and the DSR-generated observed categories.

Gaussian

"name": "GaussianEvaluator"

Default evaluator for the Gaussian decoder.

Evaluator-specific options:

n_plotted_dims (int, default=7): The number of dimension plotted in the true_vs_generated and true_vs_ae_reconstructed plots. Started from the first dimension.
dims_to_plot (Optional[list[int]], default=None): The indices of the dimensions to be plotted.
d_stsp_n_bins (int, default=30): The number of bins each dimension of state space is partitioned into for calculating the statespace divergence. Only used if the state space has smaller dimensions than 5.
pse_smoothing_sigma (float, default=1.0): The standard deviation for Gaussian smoothing of power spectra.

Returned figures:

true_vs_generated: Plots the true vs the generated trajectory in the dimensions set by the corresponding arguments.
true_vs_ae_reconstructed: Plots the true vs the reconstructed trajectory in the dimensions set by the corresponding arguments.

Returned metrics:

$D_\text{stsp.}$ (D_stsp): The statespace divergence calculated between the true and the generated trajectory. Uses the binning for a state space dimension $<5$ , otherwises uses a GMM approach.
Power spectrum error (PSE): The power spectrum error between the true and the generated trajectory.
Normalized MSE (NMSE): The normalized MSE between the true and the generated trajectory.
Prediction Error (PE): The prediction error in the given modality. Only returned if prediction error is configured. See the corresponding page for details.

Identity

"name": "IdentityEvaluator"

A modified version of the Gaussian evaluator with the only difference being that the true_vs_ae_reconstructed plot is not returned. Should be used with the identity encoder-decoder setup.

Lorenz

"name": "LorenzEvaluator"

A modified version of the Gaussian evaluator with the added plot of the 3D attractor (named attractor) and default values suitable for the Lorenz system. Should be used when training on the Lorenz system.

Additional parameters (compared to Gaussian):

generate_without_external_inputs (bool, default=False): If the original Lorenz had external inputs, setting this to true also generates trajectories without external inputs. This is useful for seeing if the freely generated trajectory truly captures the Lorenz system without external inputs.
plot_ae_reconstruction (bool, default=True): Whether to return the true vs reconstructed plot. Should be set to False if identity encoder-decoder setup is used.

Spike

"name": "SpikeEvaluator"

The default evaluator for count decoders (NegativeBinomial and ZeroInflatedNegativeBinomial).

This decoder is made specifically for binned electrophysiological spiking data. If used with other kind of count data, some metrics implemented may not be informative.

Evaluator-specific options:

use_raster_plot (bool, default=False): Whether to visualize the data as raster plots.
units_to_plot (list[int], default=[0, 1, 2, 3, 4]): If use_raster_plot is False, sets the units to be plotted on simple line plots.

Returned plots:

neurons: The raster plot or line plots showing the true, the reconstructed and the DSR-generated data.

Returned metrics:

cross_corr_nmse: The NMSE between the cross correlation calculated on the true and the DSR-generated data.
autocorrelation_nmse: The NMSE between the autocorrelation calculated on the true and the DSR-generated data.
mean_spike_rate_nmse: The NMSE between the mean firing/spiking rate calculated on the true and the DSR-generated data.
zero_count_ratio_nmse: The NMSE between the zero count ratio calculated on the true and the DSR generated data.
coefficient_of_variation_nmse: The NMSE between the coefficient of variation calculated on the true and the DSR generated data.

Modality-independent evaluators

These evaluators are still connected to a timeseries, but they receive true, generated and reconstructed trajectories in the observational and latent spaces from all modalities. Thus they are less limited in their capabilities, but still are constrained as they run once for each timeseries.

Modality-independent evaluators can be configured using the modality_independent_evaluators key. This expects a list of ModalityIndependentEvaluatorOptions. Each ModalityIndependentEvaluatorOption is a dict with keys name and parameters. Latter has following common parameters:

name (str): The display name of the evaluator.

name can be one of the following options.

Latent

"name": "LatentEvaluator"

An evaluator for the latent space.

Evaluator-specific options:

n_plotted_dims (int, default=7): The number of latent dimensions to plot, starting from the first.
dims_to_plot (Optional[list[int]], default=None): The latent dimensions to plot.

Returned plots:

generated_vs_encoded: Plots the DSR-generated vs the encoded trajectory in the dimensions set by the corresponding arguments.

Does not return metrics.

General evaluators

These evaluators are not constrained at all. They get called once per evaluation epoch, getting the whole dataset and the current model. This means that it can be used to evaluate model parameters or create more specific evaluation pipelines, where for example metrics need to be averaged over timeseries.

General evaluators can be configured using the general_evaluators key. This expects a list of GeneralEvaluatorOptions. Each GeneralEvaluatorOption is a dict with keys name and parameters. Latter has following common parameters:

name (str): The display name of the evaluator.

name can be one of the following options.

Subject space

"name": "SubjectSpaceEvaluator"

Evaluates the subject vector space, plotting the subject vector components using the configured method.

Evaluator-specific options:

plot_as_2d_projections (bool, default=True): Whether to plot the subject vectors as 2D projections of it.
- If True $(n - 1) * n / 2$ (where $n$ is the subject vector dimension) number of subplots will be created. Each subplot contains a 2D projection of the subject vectors. This is only done if $n\leq 5$ or max_dims_to_plot is set and is $\leq 5$ .
- Otherwise one plot is created for each component of thesubject vector, where the subject vectors' component is plotted against the subject index. At most 12 dimensions are plotted.
projection_mode (list[Literal["pca", "none"]], default=["none"]): List of modes of how the subject vectors should be projected before plotting them.
- "none" stands for no projection applied.
- "pca" stands for PCA. Here the dimensions will be ordered by the explained variance ratio and the ratios will be also reported to TensorBoard.
max_dims_to_plot (Optional[int], default=None): The maxumum number of dimensions to plot using either plotting method. If None, all dimensions are plotted (with the constrains listed above).

Returned figures:

Dimensions_none and Dimensions_pca: The subject vectors dimensions plotted using the given modes.

Does not return metrics.

Naming convention of metrics

Metrics can be used for optuna hyperparameter search or early stopping, where one needs to know under which name metrics are saved to be able to find them using the metric_name_pattern. Generally it is advised to do a test run and inspect the names in TensorBoard to make sure, but this section provides the general naming convention of metrics:

<evaluator_display_name>_<train/test>_<metric_name>

<evaluator_display_name> is the property set under the name key in the parameters dict of each evaluator, defaulting to the evaluator's name set as the outer name property (e.g. "GaussianEvaluator"). If multiple of the same evaluator are used (or multiple share the same display name given), the evaluators are automatically renamed to <current_display_name>_<measurement_id>_<modality_id>, which is always a unique name.
<train/test> is the set used to create the metric. For optuna and early stopping one should always use the test set!
<metric_name> can be looked up in this file, it is the value under Returned metrics in the code-font (e.g NMSE).

Modality-specific evaluators​

Categorical​

Gaussian​

Identity​

Lorenz​

Spike​

Modality-independent evaluators​

Latent​

General evaluators​

Subject space​

Naming convention of metrics​

Modality-specific evaluators

Categorical

Gaussian

Identity

Lorenz

Spike

Modality-independent evaluators

Latent

General evaluators

Subject space

Naming convention of metrics