Model Card for V2V-Transformer

This modelcard provides details about a Video-to-Video (V2V) transformer model trained on human workout videos for video synthesis and transformation tasks. It includes model usage, training details, evaluation results, and guidance for deployment.

Model Details

Model Description

This Video-to-Video (V2V) model leverages a transformer-based autoencoder to perform video synthesis by learning latent representations from video sequences. It can take one or multiple input videos and generate a corresponding transformed output video. Designed for tasks like action transfer, video interpolation, and visual enhancement.

  • Developed by: BOCK Health AI Team
  • Shared by: BOCK Health
  • Model type: Transformer-based Video Autoencoder
  • Language(s) (NLP): N/A
  • Finetuned from model: Pretrained Vision Transformer (ViT) Encoder

Direct Use

This model can be used directly to perform video-to-video transformations such as:

  • Synthesizing new videos from input sequences
  • Enhancing video resolution or style transfer
  • Human motion/action transfer for workout or sports videos
  • Auto Reconstructing distinct sequence of videos to symantically similar videos

Example:

from video_to_video_model import VideoFusionAutoencoder
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VideoFusionAutoencoder(embed_dim=512, frame_size=128).to(device)
model.load_state_dict(torch.load("checkpoint.pth", map_location=device))
model.eval()

# Pass in input video tensor of shape [N, T, C, H, W]
output_video, latent = model(input_video_tensor)
Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train bockhealthbharath/Eira-1-V2V