YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

VibeVoice F16 Model

This model has been converted to float16 (f16) precision for reduced memory usage.

Conversion Details

  • Original model: microsoft/VibeVoice-1.5B
  • Mixed precision: True
  • Memory savings: ~45.7%
  • Original size: 10.07 GB
  • Converted size: 5.47 GB

Usage

from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor

# Load with f16 precision
model = VibeVoiceForConditionalGenerationInference.from_pretrained(
    "./VibeVoice-1.5B-f16",
    torch_dtype=torch.float16,
    device_map="cpu"  # or "cuda" for GPU
)

processor = VibeVoiceProcessor.from_pretrained("./VibeVoice-1.5B-f16")

# Use --use_f16 flag with demo scripts
python demo/inference_from_file.py --model_path ./VibeVoice-1.5B-f16 --use_f16 --device cpu

Notes

  • F16 precision may result in minor quality differences compared to f32
  • Some operations automatically upcast to f32 for numerical stability
  • Optimized for CPU inference, but also works on CUDA GPUs
Downloads last month
5
Safetensors
Model size
3B params
Tensor type
F32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support