YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
VibeVoice F16 Model
This model has been converted to float16 (f16) precision for reduced memory usage.
Conversion Details
- Original model: microsoft/VibeVoice-1.5B
- Mixed precision: True
- Memory savings: ~45.7%
- Original size: 10.07 GB
- Converted size: 5.47 GB
Usage
from vibevoice.modular.modeling_vibevoice_inference import VibeVoiceForConditionalGenerationInference
from vibevoice.processor.vibevoice_processor import VibeVoiceProcessor
# Load with f16 precision
model = VibeVoiceForConditionalGenerationInference.from_pretrained(
"./VibeVoice-1.5B-f16",
torch_dtype=torch.float16,
device_map="cpu" # or "cuda" for GPU
)
processor = VibeVoiceProcessor.from_pretrained("./VibeVoice-1.5B-f16")
# Use --use_f16 flag with demo scripts
python demo/inference_from_file.py --model_path ./VibeVoice-1.5B-f16 --use_f16 --device cpu
Notes
- F16 precision may result in minor quality differences compared to f32
- Some operations automatically upcast to f32 for numerical stability
- Optimized for CPU inference, but also works on CUDA GPUs
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support