Skip to main content

Batching

Due to technical and architectural limitations, one cannot sample a batch from multiple measurements at the same time. This is solved by implementing subbatches and splitting up the batch_size into two properties. Accordingly, in the config batch_size is a dictionary expecting two keys: subbatch_size and subbatch_number. Former is the number of samples in a subbatch, latter the number of subbatches making up a full batch.

Each subbatch takes its samples from a corresponding measurement, thus one can make sure that samples are taken evenly from the whole dataset even when having multiple measurements.

Furthermore Brenner, Weber et al. (2025) found that when training a hierarchised model, it is advantageous to increase the batch size when increasing the number of subjects. This is most likely linked to each batch containing information from all subjects, resulting in a more stable gradient.

tip

As a rule of thumb, use as many subbatches (subbatch_number) as measurements, and samples per subbatch (subbatch_size) should be proportional to the number of subjects in a measurement, with subbatch_size ~ number of subjects * batch_size, with batch_size being the batch size you would normally choose.

This may lead to memory issues, so make sure that the chosen batch size can be handled by your GPU!