vp_suite.models.unet3d
- class UNet3D(device, **model_kwargs)
Bases:
vp_suite.base.base_model.VPModel
This Model is closely related to the UNet architecture (Ronneberger et al., arxiv.org/abs/1505.04597). In contrast to the original Unet, the 2D Convolutions are replaced by 3D convolutions that also incorporate the temporal dimension present in videos (which are sequences of video frames).
- CAN_HANDLE_ACTIONS = True
Whether the model can handle actions or not.
- NAME = 'UNet-3D'
The model’s name.
- REQUIRED_ARGS = ['img_shape', 'action_size', 'tensor_value_range', 'temporal_dim']
The attributes that the model creator needs to supply when creating the model.
- __init__(device, **model_kwargs)
Initializes the model by first setting all model hyperparameters, attributes and the like. Then, the model-specific init will actually create the model from the given hyperparameters
- Parameters
device (str) – The device identifier for the module.
**model_kwargs (Any) – Model arguments such as hyperparameters, input shapes etc.
- features = [8, 16, 32, 64]
Channel dimensionality per encoding/decoding stage
- forward(x, pred_frames=1, **kwargs)
Given an input sequence of t frames, predicts pred_frames (p) frames into the future.
- Parameters
x (torch.Tensor) – A batch of b sequences of t input frames as a tensor of shape [b, t, c, h, w].
pred_frames (int) – The number of frames to predict into the future.
() (**kwargs) –
Returns: A batch of sequences of p predicted frames as a tensor of shape [b, p, c, h, w].
- pred_1(x, **kwargs)
Given an input sequence of t frames, predicts one single frame into the future.
- Parameters
x (torch.Tensor) – A batch of b sequences of t input frames as a tensor of shape [b, t, c, h, w].
**kwargs (Any) – Optional input parameters such as actions.
Returns: A single frame as a tensor of shape [b, c, h, w].
- temporal_dim = None
Number of consecutive frames used for 3D convolution