Value scheduler

The ValueScheduler allows for dynamic scaling of scalar values during training. It is primarily used for loss scaling, enabling strategies like "warm-up" (gradually increasing a regularization term) or "annealing" (decreasing a weight over time).

The scheduler is configured via a dictionary that specifies the schedule type and its associated hyperparameters.

Configuration Structure

A value schedule configuration follows this structure:

{
    "name": "constant" | "linear" | "step" | "exponential",
    "hyperparameters": {
        "initial": float,
        # ... other type-specific parameters ...
    }
}

The initial parameter is common to all schedule types and defaults to 1.0 if not specified.

Schedule Types

Constant

The value remains fixed throughout the training. This is the default if name is not specified.

Formula: $v_t = \text{initial}$

Configuration:

{
    "name": "constant",
    "hyperparameters": {
        "initial": 1.0
    }
}

Linear

Linearly interpolates between the initial value and final_alpha over the total number of epochs (n_epochs).

Formula: $v_t = \text{initial} + (\text{final\_alpha} - \text{initial}) \cdot \frac{t}{N_{\text{epochs}}}$

Configuration:

{
    "name": "linear",
    "hyperparameters": {
        "initial": 0.0,
        "final_alpha": 1.0
    }
}

Step

Multiplies the value by a factor gamma every step_size epochs.

Formula: $v_t = \text{initial} \cdot \gamma^{\lfloor t / \text{step\_size} \rfloor}$

Configuration:

{
    "name": "step",
    "hyperparameters": {
        "initial": 1.0,
        "gamma": 0.5,
        "step_size": 10
    }
}

gamma: Decay factor (default: 0.1).
step_size: Number of epochs between steps (default: 10).

Exponential

Multiplies the value by a factor gamma at every epoch.

Formula: $v_t = \text{initial} \cdot \gamma^t$

Configuration:

{
    "name": "exponential",
    "hyperparameters": {
        "initial": 1.0,
        "gamma": 0.99
    }
}

gamma: Decay factor (default: 0.9).

Usage in Loss Functions

As mentioned in the Loss function documentation, the value scheduler is used to weight different loss terms. A typical configuration for a loss term might look like this:

# Example: Annealing the KL divergence (entropy) weight
"alpha_entropy": {
    "name": "linear",
    "hyperparameters": {
        "initial": 0.001,
        "final_alpha": 0.1
    }
}

The following values can be scheduled (see src/scheduler/multi_value_scheduler.py for defaults):

alpha_gtf: Generalized teacher forcing (GTF) parameter. Controls the mixing of ground-truth states and predicted states during DSR training. A value of 1.0 means full teacher forcing, while 0.0 means free running (no teacher forcing).
alpha_entropy: Scaling for the encoder entropy loss ( $\mathcal{L}_\text{ent.}$ ). Encourages the encoder to learn a meaningful probability distribution.
alpha_dsr: Scaling for the DSR loss ( $\mathcal{L}_\text{DSR}$ ). Measures how well the decoded DSR-generated trajectory aligns with the observation.
alpha_consistency: Scaling for the consistency loss ( $\mathcal{L}_\text{cons.}$ ). Measures the alignment between the encoded states and the DSR-generated states in the latent space.
alpha_reconstruction: Scaling for the reconstruction loss ( $\mathcal{L}_\text{rec.}$ ). Measures how well the autoencoder reconstructs the input data.
alpha_subject_continuity: Scaling for the subject continuity loss. Used in hierarchical models to penalize large changes in parameters between adjacent subjects or time series indices, tailored for smoother parameter variations.
alpha_MAR: Scaling for the Manifold Attractor Regularization (MAR) loss. Regularizes specific latent units towards manifold attractors, inducing a memory state.
alpha_AR_convergence: Scaling for the Autoregressive (AR) convergence loss. Regularizes the parameters of ALRNN models to ensure the dynamics remain stable (non-divergent).

Configuration Structure​

Schedule Types​

Constant​

Linear​

Step​

Exponential​

Usage in Loss Functions​

Configuration Structure

Schedule Types

Constant

Linear

Step

Exponential

Usage in Loss Functions