Model Card for V2V-Transformer

This modelcard provides details about a Video-to-Video (V2V) transformer model trained on human workout videos for video synthesis and transformation tasks. It includes model usage, training details, evaluation results, and guidance for deployment.

Model Details

Model Description

This Video-to-Video (V2V) model leverages a transformer-based autoencoder to perform video synthesis by learning latent representations from video sequences. It can take one or multiple input videos and generate a corresponding transformed output video. Designed for tasks like action transfer, video interpolation, and visual enhancement.

Developed by: BOCK Health AI Team
Shared by: BOCK Health
Model type: Transformer-based Video Autoencoder
Language(s) (NLP): N/A
Finetuned from model: Pretrained Vision Transformer (ViT) Encoder

Direct Use

This model can be used directly to perform video-to-video transformations such as:

Synthesizing new videos from input sequences
Enhancing video resolution or style transfer
Human motion/action transfer for workout or sports videos
Auto Reconstructing distinct sequence of videos to symantically similar videos

Example:

from video_to_video_model import VideoFusionAutoencoder
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VideoFusionAutoencoder(embed_dim=512, frame_size=128).to(device)
model.load_state_dict(torch.load("checkpoint.pth", map_location=device))
model.eval()

# Pass in input video tensor of shape [N, T, C, H, W]
output_video, latent = model(input_video_tensor)

Downloads last month: 46

Inference Providers NEW

Video-to-Video

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

bockhealthbharath
/

Eira-1-V2V

Model Card for V2V-Transformer

Model Details

Model Description

Direct Use

Dataset used to train bockhealthbharath/Eira-1-V2V