Teacher forcing
The repository implements STF and GTF. These can be configured via three parameters:
dim_forcing (int, default=latent_dim): The number of units of the latent space which are forced.
The forcing dimension doubles as the bottleneck dimension of the VAE, meaning that the encoder projects to a dim_forcing dimensional space and that the decoder only uses the dim_forcing units to generate the observations.
Due to this the terms "forced units" and "observed units" are used interchangeably throughout the documentation.
It is advised to set the forcing dimension smaller than the latent dimension. This has a dual purpose. On the one hand it ensures that gradients propagate through the non-forced part of the latent space even if teacher forcing cuts off gradients of the forced units. On the other hand it give the model the opportunity to use the additional, non-observed states to implement necessary calculations.
n_interleave (int, default=1): The parameter of STF, the interval between two forcings.alpha_gtf (float, default=1): The parameter of GTF, the ratio of the teacher forcing signal mixed into the state.alpha_gtfshould be chosen to be between 0 and 1. Note thatalpha_gtfis part of thevalue_schedulerconstruct, which allows for the scheduling of the parameter over the course of the training process. For details on this please refer to related documentation.
Technically it is possible to set n_interleave != 1 and alpha_gtf != 1, making it a mixture of STF and GTF. This has not been tested and might lead to unexpected behavior.