F5-TTS Arabic

ู†ู…ูˆุฐุฌ ุชุญูˆูŠู„ ุงู„ู†ุต ุฅู„ู‰ ูƒู„ุงู… ุจุงู„ู„ุบุฉ ุงู„ุนุฑุจูŠุฉ

Arabic text-to-speech model fine-tuned on 300 hours of clean Arabic audio data. Produces consistent, high-quality speech synthesis for Modern Standard Arabic with full diacritization.

Model Details

Base Model: F5-TTS
Training Data: ~300 hours of clean Arabic audio
Language: Modern Standard Arabic (MSA)

Usage

Quick Start

for infernce with text chunking see the Colab notebook.

from huggingface_hub import hf_hub_download

# Download model files
vocab_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="vocab.txt")
ckpt_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="model_547500_8_18.pt")
config_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="F5TTS_Base_8_18.yaml")
ref_audio = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="reference.wav")

# Run inference via CLI
!python -m f5_tts.infer.infer_cli \
  --model_cfg "{config_file}" \
  --output_file "./output.wav" \
  --model "F5TTS_Base" \
  --ckpt_file "{ckpt_file}" \
  --vocab_file "{vocab_file}" \
  --ref_audio "{ref_audio}" \
  --nfe_step 32 \
  --cfg_strength 1.8 \
  --ref_text "YOUR_REFERENCE_TEXT_WITH_TASHKEEL" \
  --gen_text "YOUR_GENERATION_TEXT_WITH_TASHKEEL" \
  --speed 0.9

Key Features

  • High-quality Arabic speech synthesis
  • Consistent voice cloning from reference audio
  • Works best with moderate text lengths (chunking recommended for long texts)
  • Supports speed adjustment
  • Fine-tunable for specific use cases

Input Requirements

Critical: Text must include full Arabic diacritization (tashkeel). The model is trained exclusively on fully diacritized text and will not perform well on non-diacritized input.

Example of correct input:

ุฅูู†ูŽู‘ ุงู„ู’ุนูู„ู’ู…ูŽ ู†ููˆุฑูŒ ูŠูู‚ู’ุฐูŽูู ูููŠ ุงู„ู’ู‚ูŽู„ู’ุจู

Sample Output

Text: ุฅูู†ูŽู‘ ุงู„ู’ุนูู„ู’ู…ูŽ ู„ูŽูŠู’ุณูŽ ุจููƒูŽุซู’ุฑูŽุฉู ุงู„ุฑูู‘ูˆูŽุงูŠูŽุฉูุŒ ูˆูŽุฅูู†ูŽู‘ู…ูŽุง ู‡ููˆูŽ ู†ููˆุฑูŒ ูŠูู‚ู’ุฐูŽูู ูููŠ ุงู„ู’ู‚ูŽู„ู’ุจูุŒ ูŠูŽูู’ู‡ูŽู…ู ุจูู‡ู ุงู„ู’ุนูŽุจู’ุฏู ุญูŽู‚ูŽุงุฆูู‚ูŽ ุงู„ู’ุฃูู…ููˆุฑู. ูˆูŽุงู„ู’ุญููƒู’ู…ูŽุฉู ุถูŽุงู„ูŽู‘ุฉู ุงู„ู’ู…ูุคู’ู…ูู†ูุŒ ููŽุญูŽูŠู’ุซูู…ูŽุง ูˆูŽุฌูŽุฏูŽู‡ูŽุง ููŽู‡ููˆูŽ ุฃูŽุญูŽู‚ูู‘ ุจูู‡ูŽุง. ูˆูŽู…ูŽู†ู’ ุทูŽู„ูŽุจูŽ ุงู„ู’ุนูู„ูŽุง ู…ูู†ู’ ุบูŽูŠู’ุฑู ูƒูŽุฏูู‘ุŒ ุฃูŽุถูŽุงุนูŽ ุงู„ู’ุนูู…ูุฑูŽ ูููŠ ุทูŽู„ูŽุจู ุงู„ู’ู…ูุญูŽุงู„ู. ููŽุงุตู’ุจูุฑู’ ุนูŽู„ูŽู‰ ู…ูุฑูู‘ ุงู„ู’ุญูŽู‚ูู‘ุŒ ูˆูŽู„ูŽุง ุชูŽุณู’ุชูŽุนู’ุฌูู„ู’ ู‚ูŽุทู’ููŽ ุงู„ุซูŽู‘ู…ูŽุฑูŽุฉู ู‚ูŽุจู’ู„ูŽ ู†ูุถู’ุฌูู‡ูŽุงุŒ ููŽุฅูู†ูŽู‘ ู„ููƒูู„ูู‘ ุดูŽูŠู’ุกู ุฃูŽูˆูŽุงู†ู‹ุงุŒ ูˆูŽู„ููƒูู„ูู‘ ู…ูŽู‚ูŽุงู…ู ู…ูŽู‚ูŽุงู„ู‹ุง.

refernce

Further Fine-tuning

The model can be further fine-tuned for:

  • Non-diacritized text (requires additional training)
  • Specific voice characteristics
  • Domain-specific vocabulary
  • Dialectal variations

License

This model is released under a Non-Commercial License.

  • You may use this model for research, educational, and personal non-commercial purposes.
  • Commercial use is strictly prohibited without explicit permission.
  • If you wish to use this model for commercial purposes, please contact the model author.

Limitations

  • Requires fully diacritized Arabic text as input
  • Optimized for Modern Standard Arabic (MSA), not dialectal Arabic
  • Performance may vary with very long texts without chunking
  • Voice cloning quality depends on reference audio quality and length
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for IbrahimSalah/Arabic-F5-TTS-v2

Base model

SWivid/F5-TTS
Finetuned
(69)
this model

Spaces using IbrahimSalah/Arabic-F5-TTS-v2 3