vp_suite.models.lstm

class LSTM(device, **model_kwargs)

Bases: vp_suite.base.base_model.VPModel

This class implements a simple encoder-decoder-based video prediction architecture which passes the vector-shaped encoded latents through several LSTM layers.

CAN_HANDLE_ACTIONS = True

Whether the model can handle actions or not.

CODE_REFERENCE = None

The code location of the reference implementation.

MATCHES_REFERENCE: str = 'Not Yet'

A comment indicating whether the implementation in this package matches the reference.

NAME = 'NonConvLSTM'

The model’s name.

PAPER_REFERENCE = None

The publication where this model was introduced first.

__init__(device, **model_kwargs)

Initializes the model by first setting all model hyperparameters, attributes and the like. Then, the model-specific init will actually create the model from the given hyperparameters

Parameters
  • device (str) – The device identifier for the module.

  • **model_kwargs (Any) – Model arguments such as hyperparameters, input shapes etc.

bottleneck_dim = 1024

The dimensionality of the linearized latent space.

decode(x)
encode(x)
forward(x, pred_frames=1, **kwargs)

Given an input sequence of t frames, predicts pred_frames (p) frames into the future.

Parameters
  • x (torch.Tensor) – A batch of b sequences of t input frames as a tensor of shape [b, t, c, h, w].

  • pred_frames (int) – The number of frames to predict into the future.

  • () (**kwargs) –

Returns: A batch of sequences of p predicted frames as a tensor of shape [b, p, c, h, w].

lstm_hidden_dim = 1024

The hidden dimensionality of the LSTM cells.

lstm_num_layers = 3

The number of LSTM cell layers.

pred_1(x, **kwargs)

Given an input sequence of t frames, predicts one single frame into the future.

Parameters
  • x (torch.Tensor) – A batch of b sequences of t input frames as a tensor of shape [b, t, c, h, w].

  • **kwargs (Any) – Optional input parameters such as actions.

Returns: A single frame as a tensor of shape [b, c, h, w].