Improve model card: Add detailed description, sample usage, and update paper link

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +23 -7
README.md CHANGED
@@ -1,13 +1,13 @@
1
  ---
2
- license: apache-2.0
3
  base_model: Wan-AI/Wan2.1-T2V-14B
 
 
4
  tags:
5
  - text-to-video
6
  - diffusion
7
  - video-generation
8
  - turbodiffusion
9
  - wan2.1
10
- pipeline_tag: text-to-video
11
  ---
12
 
13
  <p align="center">
@@ -16,14 +16,31 @@ pipeline_tag: text-to-video
16
 
17
  # TurboWan2.1-T2V-14B-480P
18
 
19
- - This HuggingFace repo contains the `TurboWan2.1-T2V-14B-480P` model.
20
 
21
  - For RTX 5090 or similar GPUs, please use the `TurboWan2.1-T2V-14B-480P-quant`. For other GPUs with a bigger GPU memory than 40GB, we recommend using `TurboWan2.1-T2V-14B-480P`.
22
 
23
- - For usage instructions, please see **https://github.com/thu-ml/TurboDiffusion**
 
 
24
 
25
- - Paper: [TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times](https://arxiv.org/pdf/2512.16093)
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Citation
29
  ```
@@ -81,5 +98,4 @@ pipeline_tag: text-to-video
81
  journal={arXiv preprint arXiv:2505.11594},
82
  year={2025}
83
  }
84
- ```
85
-
 
1
  ---
 
2
  base_model: Wan-AI/Wan2.1-T2V-14B
3
+ license: apache-2.0
4
+ pipeline_tag: text-to-video
5
  tags:
6
  - text-to-video
7
  - diffusion
8
  - video-generation
9
  - turbodiffusion
10
  - wan2.1
 
11
  ---
12
 
13
  <p align="center">
 
16
 
17
  # TurboWan2.1-T2V-14B-480P
18
 
19
+ This repository contains the `TurboWan2.1-T2V-14B-480P` model, part of the **TurboDiffusion** framework presented in [TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times](https://huggingface.co/papers/2512.16093). TurboDiffusion is a video generation acceleration framework that can speed up end-to-end diffusion generation by 100-200x while maintaining video quality.
20
 
21
  - For RTX 5090 or similar GPUs, please use the `TurboWan2.1-T2V-14B-480P-quant`. For other GPUs with a bigger GPU memory than 40GB, we recommend using `TurboWan2.1-T2V-14B-480P`.
22
 
23
+ - For usage instructions and more details, please see the official GitHub repository: **https://github.com/thu-ml/TurboDiffusion**
24
+
25
+ ## Sample Usage
26
 
27
+ To run text-to-video inference using the `TurboWan2.1-T2V-1.3B-480P-quant` model, follow these steps. For full instructions, including downloading necessary VAE and text encoder checkpoints, refer to the [GitHub repository](https://github.com/thu-ml/TurboDiffusion#inference).
28
 
29
+ ```bash
30
+ export PYTHONPATH=turbodiffusion
31
+
32
+ # Example for Text-to-Video (T2V) inference
33
+ python turbodiffusion/inference/wan2.1_t2v_infer.py \
34
+ --model Wan2.1-1.3B \
35
+ --dit_path checkpoints/TurboWan2.1-T2V-1.3B-480P-quant.pth \
36
+ --resolution 480p \
37
+ --prompt "A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about." \
38
+ --num_samples 1 \
39
+ --num_steps 4 \
40
+ --quant_linear \
41
+ --attention_type sagesla \
42
+ --sla_topk 0.1
43
+ ```
44
 
45
  ## Citation
46
  ```
 
98
  journal={arXiv preprint arXiv:2505.11594},
99
  year={2025}
100
  }
101
+ ```