Skip to main content

After training

The main, assembled model is called the MTFModel. This is located in src/models/mtf_model.py and the instance contains variables pointing to the encoder (self.encoder_model), decoder (self.decoder_model) and the DSR model (self.dsr_model).

Loading the model

This model is initialized using its static method MTFModel.from_checkpoint. It loads all data necessary from the model directory and returns the initialized and loaded MTFModel. It expects three parameters:

  • model_dir (str): The directory where the model is saved in. This is usually <result_dir>/<experiment_name>/<model_name>/<run_number>.
  • epoch (int): The epoch number of the save to be loaded.
tip

Every function in the codebase expects input tensors to be of shape T×B×NT\times B\times N and returns predictions and encodings in the same manner. Here TT is the timeseries length, BB is the batch size and NN is the feature dimension. Any additional dimensions follow after NN.

Identification of the generating model

As covered before, the codebase integrates the hierarchical approach, where multiple models are trained simultaneously by extracting similarities and differences in the data into group-level and subject-specific parameters. This framework combined with constraints described earlier has the side effect that every time when using the either part of the model one has to select which model to use. This selection is called model identification of identification method in the coming sections.

note

In the evaluation functions the identification methods listed below are only used to identify the model and do not have any other function. Thus one may use any identification method for generating from any timeseries or initial condition which may or may not be contained by the dataset.

Measurement identifier and timeseries index

The method used most often to select the model is to pass the measurement identifier and timeseries index of the timeseries currently used. This is most useful if the goal is to compare the generated data with the ground truth, which is part of the dataset. During model initialzation every tuple of measurement id and timeseries index is assigned a model, thus supplying these two clearly identifies the model to use for generation.

Cumulative timeseries index

During model training the so called cumulative timeseries index is used as the identification method. This is the timeseries index cumulated over multiple measurements. For example if the first measurement contains 50 timeseries, then the third timeseries in the second measurement has cumulative timeseries index 52 (note that indexing starts with 0).

This index contains the same information as the measurement id and timeseries index tuple, and has the upside that it is vectorizable using tensor operations, which is necessary for batching. This also means that when evaluating, generating multiple trajectories in a batched manner requires the use of the cumulative timeseries index.

tip

When using the dataset note that each evaluator sample returns information about the measurement id and timeseries index as well as the cumulative timeseries index.

Subject indices, external indices

In case of generating only in the latent space, one may use the subject index and possibly the external index for model identification.

The subject index and external index are the indices of the subject- and external parameters, which are set at model initialization. Here the code loops over all subject- and external groups based on the dsr_subject_groups and dsr_external_groups config parameters and creates the parameters based on them. Each subject- and external parameterset is assigned an index based on this looping process.

Subject- and external parameters

In certain cases one may not want to use parameters learned during the training process, for example if one wants to interpolate between or extrapolate from the learned subject vectors. For this case, functions operating solely in the latent space have the option to specify the subject and external parameters directly.

Generating free trajectories

Generating a free trajectory means allowing the DSR model to freely (without TF) predict latent states given an initial condition. Optionally the initial latent state may be obtained from the observations using the encoder and the free latent trajectory may be decoded into the observational space.

The codebase provides multiple functions facilitating this process.

MTFModel.generate_free_trajectory

This function is the main way for generating free trajectories, when the initial state has to be obtained by first be encoding the observations.

It features following parameters:

  • observations (tc.Tensor): The ground truth observations. Shape should be (T, (B), N) where N is the sum of all modality dimensions. The batch dimension is optional, the returned tensors will contain a batch dimension based on whether a batch dimension is given in the input.
  • cumulative_timeseries_index (Optional[int | tc.Tensor], default=None): The cumulative timeseries index (see above). If not set, measurement_id and timeseries_index have to be set. Used for model identification. Note that using this instead of measurement_id and timeseries_index allows for batched generation but only if all cumulative_timeseries_indices come from the same measurement.
  • measurement_id (Optional[str], default=None): The measurement id. If set, timeseries_index has to be set as well. Used formodel identification.
  • timeseries_index (Optional[int], default=None): The timeseries index. If set, measurement_id has to be set as well. Used for model identification
  • external_inputs (Optional[tc.Tensor], default=None): The external inputs to the latent model.
  • T (Optional[int], default=None): The length of the freely generated trajectory. If not set, the shorter of the lengths of external_inputs and observations is used. Even if set, the length will be capped with the length of external_inputs.
  • stack_modalities (bool, default=True): Whether in the generated trajectories in the observation space, the modalities are stacked in the last dimension (T, (B), sum_of_modality_dims) when decoding or returned as a list list[(T, (B), dim_modality_i)].
  • num_samples (int, default=1): The number of samples to be taken from the decoder distributions. If larger than 1, the returned shape will be (num_samples, T, (B), N).
  • use_expected_output (bool | list[bool], default=False): Whether the decoders should be sampled or the expected value should be returned. If list[bool], each entry stands for a decoder modality in the chosen measurement. Note that if any modality is set to True, the output may not be of the modality's distribution (e.g. a count distribution's expected value may not be an integer).
  • return_decoded_teacher_forcing (bool, default=False): Whether to return the reconstructed trajectory, I.E. the one encoded and then directly decoded again, without the use of the latent DSR model.
  • return_only_latent (bool, default=False): Whether to completely omit the decoding of the trajectories. If True, all parameters acting on the decoders are ignored.
  • return_entropy (bool, default=False): Whether the encoder's entropy is returned. Mainly used when calculating the test loss.
  • use_latent_mean (bool, default=False): Whether the encoder's distribution is sampled when encoding the ground truth observations or the mean of the distribution is used.

It returns a tuple with the following entries:

  1. Generated latent trajectory: The one generated by the latent DSR model. Using the first timestep of the encoded observations as the intial condition. The initial value is
  2. Encoded latent trajectory: Encoded from the observations.
  3. Predicted observations: Generated by the DSR model and decoded. Returns None if return_only_latent is True.
  4. Decoded trajectory from encoded latent trajectory: Returns None if return_decoded_teacher_forcing is False.
  5. Entropy of the encoder: Returns None if return_entropy is False.
note

All returned tensors contain the initial value at the first timestep. So in order to predict one step into the future, one must assign T=2 to the function.

MTFModel.generate_decoded_trajectory

This function decodes a given latent trajectory.

It has following parameters:

  • latent_trajectory (tc.Tensor): Latent trajectory to decode. Has shape (T, (B), latent_dim)
  • measurement_id_or_index (int | str): The measurement id or index. Used for model identification.
  • stack_modalities (bool, default=True): Whether to stack modalities on the last dimension when decoding or return them as a list. See the parameter with the similaar name in the last function definition.
  • num_samples (int, default=1): Number of decoded samples to generate. If > 1, the output will have shape (num_samples, T, (B), N). See the parameter with the similar name in the last function definition.
  • use_expected_output (bool, default=False): Whether to return the expected value of the decoder distribution instead of a sample. If list, one value should be given per modality. Should not be used with num_samples > 1. See the parameter with the similar name in the last function definition.

Returns the decoded trajectory.

DSRModel.generate_free_trajectory

This function can be used to generate a free latent trajectory without first encoding the ground truth or later decoding.

It has following parameters:

  • z0 (tc.Tensor): The initial condition of the trajectory. Expected shape ((B), latent_dim).
  • T (int): The number of steps to be generated. If not given, the length of the external_inputs are used instead. T will be maximized to the length of external_inputs.
  • external_inputs (Optional[tc.Tensor], default=None): The external inputs to use for generation. Expected shape (T, (B), external_dim)
  • measurement_id (str): The measurement id. Used for model identification.
  • timeseries_index (int): The timeseries index. Used for model identification.
  • cumulative_timeseires_index (int | tc.Tensor) (int or B): The cumulative timeseries index or indices to be used.
  • subject_index (int | tc.Tensor): The subject index or indices to be used. If tensor is given, expected shape is (B,), each entry specifying which subject to use for which sample in the batch.
  • external_index (int | tc.Tensor): The external matrix's index to be used. If tensor is given, expected shape is (B,), each entry specifying which external matrix to use for which sample in the batch.
  • subject_params (int | tc.Tensor): The subject parameters to be used for generation. Expected shape is ((B), N_features) See above for details.
  • external_params: The external parameters to be used for generation. Expected shape is ((B), latent_dim, external_dim) See above for details.

DSRModel.construct_params

This function can be used to obtain model-specific parameters (e.g. A, W, hA,~W,~h) given a model identification. Note that this function is not marked with @torch.no_grad() (as it is used during training). Make sure to turn off gradient calculation when doing evaluation to not waste computational resources.

It has following parameters:

  • measurement_id: All parameters have the same signature as for the previous function definition.
  • timeseries_index: All parameters have the same signature as for the previous function definition.
  • cumulative_timeseries_index: All parameters have the same signature as for the previous function definition.
  • subject_index: All parameters have the same signature as for the previous function definition.
  • external_index: All parameters have the same signature as for the previous function definition.
  • subject_params: All parameters have the same signature as for the previous function definition.
  • external_params: All parameters have the same signature as for the previous function definition.

Returns a dictionary with keys according to the model-specific parameters of the given latent step.

Analyzing the subject space

One of the main features of the hierarchical architecture is the emergence of a structured and interpretable subject-space.

The trained subject vectors can be accessed via model.dsr_model.latent_step_params, where model is an instance of the MTFModel class. latent_step_params is a dictionary with three keys:

  • group: The group-level parameters of the model as defined by the hierarchisation scheme.
  • subject_vectors: The subject-level parameters of the model as defined by the hierarchisation scheme.
  • external_matrices: The external matrices of the model.

subject_vectors is a tc.Tensor with shape (num_subjects, num_features), thus it can be used for any custom evaluation task by e.g. converting it to a numpy array.