vp_suite.base.base_dataset

class VPData(*args, **kwargs)

Bases: dict

This template class defines the return type for all datasets.

actions: torch.Tensor

torch tensors of shape [t, a].

Type: Actions per frame

frames: torch.Tensor

torch tensors of shape [t, c, h, w].

Type: Video frames

origin: str: A string specifying the source of the data.

class VPDataset(split, **dataset_kwargs)

Bases: torch.utils.data.dataset.Dataset

The base class for all video prediction dataset loaders. Data points are provided in the shape of VPData dicts.

Note

VPDataset objects are not usable directly after creation since the sequence length is unspecified. In order to fully prepare the dataset, self.set_seq_len() has to be called with the desired amount of frames and the seq_step. Afterwards, the VPDataset object. is ready to be queried for data.

ACTION_SIZE: int = NotImplemented: The size of the action vector per frame (If the dataset provides no actions, this value is 0).

DATASET_FRAME_SHAPE: (<class 'int'>, <class 'int'>, <class 'int'>) = NotImplemented: Shape of a single frame in the dataset (height, width, channels).

DEFAULT_DATA_DIR: pathlib.Path = NotImplemented: The default save location of the dataset files.

IS_DOWNLOADABLE: str = None: A string identifying whether the dataset can be (freely) downloaded.

MIN_SEQ_LEN: int = NotImplemented: The minimum sequence length provided by the dataset.

NAME: str = NotImplemented: The dataset’s name.

NON_CONFIG_VARS = ['functions', 'ready_for_usage', 'total_frames', 'seq_len', 'frame_offsets', 'data_dir']: Variables that do not get included in the dict returned by self.config() (Constants are not included either).

ON_THE_FLY: bool = False: If true, accessing the dataset means data is generated on the fly rather than fetched from storage.

REFERENCE: str = None: The reference (publication) where the original dataset is introduced.

VALID_SPLITS = ['train', 'test']: The valid arguments for specifying splits.

__init__(split, **dataset_kwargs)

Initializes the dataset loader by determining its split and extracting and processing all dataset attributes from the parameters given in dataset_kwargs.

Parameters

split (str) – The dataset’s split identifier (i.e. whether it’s a training/validation/test dataset)
**dataset_kwargs (Any) – Optional dataset arguments for image transformation, value_range, splitting etc.

property config: dict

A dictionary containing the complete dataset configuration, including common attributes as well as dataset-specific attributes.

Type: Returns
Return type: dict

data_dir: str = None: The specified path to the folder containing the dataset.

default_available(split, **dataset_kwargs)

Tries to load a dataset and a datapoint using the default self.data_dir value. If this succeeds, then we can safely use the default data dir, otherwise a new dataset has to be downloaded and prepared.

Parameters

split (str) – The dataset’s split identifier (i.e. whether it’s a training/validation/test dataset).
**dataset_kwargs (Any) – Optional dataset arguments for image transformation, value_range, splitting etc.

Returns: True if we could load the dataset using default values, False otherwise.

classmethod download_and_prepare_dataset(): Downloads the specific dataset, prepares it for the video prediction task (if needed) and stores it in a default location in the ‘data/’ folder. Implemented by the derived dataset classes.

classmethod get_test(**dataset_kwargs)

A wrapper method that creates a test dataset from the given dataset class. Like when initializing such datasets directly, optional dataset arguments can be specified with **dataset_kwargs.

Parameters: **dataset_kwargs (Any) – optional dataset arguments for image transformation, value_range, splitting etc.

Returns: The created test dataset of the same class.

classmethod get_train_val(**dataset_kwargs)

A wrapper method that creates a training and a validation dataset from the given dataset class. Like when initializing such datasets directly, optional dataset arguments can be specified with **dataset_kwargs.

Parameters: **dataset_kwargs (Any) – Optional dataset arguments for image transformation, value_range, splitting etc.

Returns: The created training and validation dataset of the same class.

img_shape: (<class 'int'>, <class 'int'>, <class 'int'>) = NotImplemented: Shape of a single frame as returned by __getitem()__.

postprocess(x)

Converts a normalized tensor of an image to a denormalized numpy array. Output: np.uint8, shape: […, h, w, c], range: [0, 255]

Parameters: x (torch.Tensor) – Input tensor of shape […, c, h, w] and (approx.) range [min_val, max_val].

Returns: A post-processed (quantized) sequence array ready for display.

Return type: ndarray

preprocess(x, transform=True)

Preprocesses the input sequence to make it usable by the video prediction models. Makes use of the transformations defined in self.__init__(). Workflow is as follows:

Convert to torch tensor of type torch.float.
Permute axes to obtain the following shape: [frames/time (t), channels (c), height (h), width (w)].
Scale values to the interval defined by self.value_range_min and self.value_range_max.
Crop the image (if applicable).
Resize the image (if applicable).
Perform further data augmentation operations (if applicable).

Parameters

x (Union[np.ndarray, torch.Tensor]) – The input sequence.
transform (bool) – Whether to crop/resize/augment the sequence using the dataset’s transformations.

Returns: The preprocessed sequence tensor.

Return type: Tensor

reset_rng(): Optional logic for resetting the RNG of a dataset.

seq_step: int = 1: With a step N, every Nth frame is included in the returned sequence.

set_seq_len(context_frames, pred_frames, seq_step)

Set the sequence length for the upcoming run. Assumes that the given parameters lead to a sequence length that does not exceed the minimum sequence length specified in self.MIN_SEQ_LEN.

Parameters

context_frames (int) – Number of input/context frames.
pred_frames (int) – Number of frames to be predicted.
seq_step (int) – Sequence step (for step N, assemble the sequence by taking every Nth frame).

split: str = None: The dataset’s split identifier (i.e. whether it’s a training/validation/test dataset).

train_to_val_ratio: float = 0.8: The ratio of files that will be training data (rest will be validation data). For bigger datasets, this ratio can be set closer to 1.

train_val_seed = 1234: Random seed used to separate training and validation data.

transform: torch.nn.modules.module.Module = None: This module gets called in the preprocessing step and consists of pre-specified cropping, resizing and augmentation layers.

value_range_max: float = 1.0: The upper end of the value range for the returned data.

value_range_min: float = 0.0: The lower end of the value range for the returned data.

class VPSubset(dataset, indices)

Bases: torch.utils.data.dataset.Subset

A minimal wrapper around Subset that allows to directly access the underlying dataset’s attributes.

dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataset.T_co]

indices: Sequence[int]