Teacher forcing

The repository implements STF and GTF. These can be configured via three parameters:

dim_forcing (int, default=latent_dim): The number of units of the latent space which are forced.

note

The forcing dimension doubles as the bottleneck dimension of the VAE, meaning that the encoder projects to a dim_forcing dimensional space and that the decoder only uses the dim_forcing units to generate the observations.

Due to this the terms "forced units" and "observed units" are used interchangeably throughout the documentation.

tip

It is advised to set the forcing dimension smaller than the latent dimension. This has a dual purpose. On the one hand it ensures that gradients propagate through the non-forced part of the latent space even if teacher forcing cuts off gradients of the forced units. On the other hand it give the model the opportunity to use the additional, non-observed states to implement necessary calculations.

n_interleave (int, default=1): The $\tau$ parameter of STF, the interval between two forcings.
alpha_gtf (float, default=1): The $\alpha$ parameter of GTF, the ratio of the teacher forcing signal mixed into the state. alpha_gtf should be chosen to be between 0 and 1. Note that alpha_gtf is part of the value_scheduler construct, which allows for the scheduling of the parameter over the course of the training process. For details on this please refer to related documentation.

note

Technically it is possible to set n_interleave != 1 and alpha_gtf != 1, making it a mixture of STF and GTF. This has not been tested and might lead to unexpected behavior.