F5-TTS Arabic
ูู ูุฐุฌ ุชุญููู ุงููุต ุฅูู ููุงู ุจุงููุบุฉ ุงูุนุฑุจูุฉ
Arabic text-to-speech model fine-tuned on 300 hours of clean Arabic audio data. Produces consistent, high-quality speech synthesis for Modern Standard Arabic with full diacritization.
Model Details
Base Model: F5-TTS
Training Data: ~300 hours of clean Arabic audio
Language: Modern Standard Arabic (MSA)
Usage
Quick Start
for infernce with text chunking see the Colab notebook.
from huggingface_hub import hf_hub_download
# Download model files
vocab_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="vocab.txt")
ckpt_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="model_547500_8_18.pt")
config_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="F5TTS_Base_8_18.yaml")
ref_audio = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="reference.wav")
# Run inference via CLI
!python -m f5_tts.infer.infer_cli \
--model_cfg "{config_file}" \
--output_file "./output.wav" \
--model "F5TTS_Base" \
--ckpt_file "{ckpt_file}" \
--vocab_file "{vocab_file}" \
--ref_audio "{ref_audio}" \
--nfe_step 32 \
--cfg_strength 1.8 \
--ref_text "YOUR_REFERENCE_TEXT_WITH_TASHKEEL" \
--gen_text "YOUR_GENERATION_TEXT_WITH_TASHKEEL" \
--speed 0.9
Key Features
- High-quality Arabic speech synthesis
- Consistent voice cloning from reference audio
- Works best with moderate text lengths (chunking recommended for long texts)
- Supports speed adjustment
- Fine-tunable for specific use cases
Input Requirements
Critical: Text must include full Arabic diacritization (tashkeel). The model is trained exclusively on fully diacritized text and will not perform well on non-diacritized input.
Example of correct input:
ุฅูููู ุงููุนูููู
ู ูููุฑู ููููุฐููู ููู ุงููููููุจู
Sample Output
Text: ุฅูููู ุงููุนูููู ู ููููุณู ุจูููุซูุฑูุฉู ุงูุฑููููุงููุฉูุ ููุฅููููู ูุง ูููู ูููุฑู ููููุฐููู ููู ุงููููููุจูุ ููููููู ู ุจููู ุงููุนูุจูุฏู ุญูููุงุฆููู ุงููุฃูู ููุฑู. ููุงููุญูููู ูุฉู ุถูุงูููุฉู ุงููู ูุคูู ูููุ ููุญูููุซูู ูุง ููุฌูุฏูููุง ูููููู ุฃูุญูููู ุจูููุง. ููู ููู ุทูููุจู ุงููุนูููุง ู ููู ุบูููุฑู ููุฏููุ ุฃูุถูุงุนู ุงููุนูู ูุฑู ููู ุทูููุจู ุงููู ูุญูุงูู. ููุงุตูุจูุฑู ุนูููู ู ูุฑูู ุงููุญููููุ ููููุง ุชูุณูุชูุนูุฌููู ููุทููู ุงูุซููู ูุฑูุฉู ููุจููู ููุถูุฌูููุงุ ููุฅูููู ููููููู ุดูููุกู ุฃูููุงููุงุ ููููููููู ู ูููุงู ู ู ูููุงููุง.
refernce
Further Fine-tuning
The model can be further fine-tuned for:
- Non-diacritized text (requires additional training)
- Specific voice characteristics
- Domain-specific vocabulary
- Dialectal variations
License
This model is released under a Non-Commercial License.
- You may use this model for research, educational, and personal non-commercial purposes.
- Commercial use is strictly prohibited without explicit permission.
- If you wish to use this model for commercial purposes, please contact the model author.
Limitations
- Requires fully diacritized Arabic text as input
- Optimized for Modern Standard Arabic (MSA), not dialectal Arabic
- Performance may vary with very long texts without chunking
- Voice cloning quality depends on reference audio quality and length
Model tree for IbrahimSalah/Arabic-F5-TTS-v2
Base model
SWivid/F5-TTS