F5-TTS Arabic

نموذج تحويل النص إلى كلام باللغة العربية

Arabic text-to-speech model fine-tuned on 300 hours of clean Arabic audio data. Produces consistent, high-quality speech synthesis for Modern Standard Arabic with full diacritization.

Model Details

Base Model: F5-TTS
Training Data: ~300 hours of clean Arabic audio
Language: Modern Standard Arabic (MSA)

Usage

Quick Start

for infernce with text chunking see the Colab notebook.

from huggingface_hub import hf_hub_download

# Download model files
vocab_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="vocab.txt")
ckpt_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="model_547500_8_18.pt")
config_file = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="F5TTS_Base_8_18.yaml")
ref_audio = hf_hub_download(repo_id="IbrahimSalah/Arabic-F5-TTS-v2", filename="reference.wav")

# Run inference via CLI
!python -m f5_tts.infer.infer_cli \
  --model_cfg "{config_file}" \
  --output_file "./output.wav" \
  --model "F5TTS_Base" \
  --ckpt_file "{ckpt_file}" \
  --vocab_file "{vocab_file}" \
  --ref_audio "{ref_audio}" \
  --nfe_step 32 \
  --cfg_strength 1.8 \
  --ref_text "YOUR_REFERENCE_TEXT_WITH_TASHKEEL" \
  --gen_text "YOUR_GENERATION_TEXT_WITH_TASHKEEL" \
  --speed 0.9

Key Features

High-quality Arabic speech synthesis
Consistent voice cloning from reference audio
Works best with moderate text lengths (chunking recommended for long texts)
Supports speed adjustment
Fine-tunable for specific use cases

Input Requirements

Critical: Text must include full Arabic diacritization (tashkeel). The model is trained exclusively on fully diacritized text and will not perform well on non-diacritized input.

Example of correct input:

إِنَّ الْعِلْمَ نُورٌ يُقْذَفُ فِي الْقَلْبِ

Sample Output

Text: إِنَّ الْعِلْمَ لَيْسَ بِكَثْرَةِ الرِّوَايَةِ، وَإِنَّمَا هُوَ نُورٌ يُقْذَفُ فِي الْقَلْبِ، يَفْهَمُ بِهِ الْعَبْدُ حَقَائِقَ الْأُمُورِ. وَالْحِكْمَةُ ضَالَّةُ الْمُؤْمِنِ، فَحَيْثُمَا وَجَدَهَا فَهُوَ أَحَقُّ بِهَا. وَمَنْ طَلَبَ الْعُلَا مِنْ غَيْرِ كَدٍّ، أَضَاعَ الْعُمُرَ فِي طَلَبِ الْمُحَالِ. فَاصْبِرْ عَلَى مُرِّ الْحَقِّ، وَلَا تَسْتَعْجِلْ قَطْفَ الثَّمَرَةِ قَبْلَ نُضْجِهَا، فَإِنَّ لِكُلِّ شَيْءٍ أَوَانًا، وَلِكُلِّ مَقَامٍ مَقَالًا.

refernce

Further Fine-tuning

The model can be further fine-tuned for:

Non-diacritized text (requires additional training)
Specific voice characteristics
Domain-specific vocabulary
Dialectal variations

License

This model is released under a Non-Commercial License.

You may use this model for research, educational, and personal non-commercial purposes.
Commercial use is strictly prohibited without explicit permission.
If you wish to use this model for commercial purposes, please contact the model author.

Limitations

Requires fully diacritized Arabic text as input
Optimized for Modern Standard Arabic (MSA), not dialectal Arabic
Performance may vary with very long texts without chunking
Voice cloning quality depends on reference audio quality and length

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for IbrahimSalah/Arabic-F5-TTS-v2

Base model

SWivid/F5-TTS

Finetuned

(69)

this model

IbrahimSalah
/

Arabic-F5-TTS-v2