sujithatz
/

ggml-whisper-medium-ml

+---
+language:
+- ml
+license: apache-2.0
+tags:
+- whisper
+- whisper.cpp
+- ggml
+- quantized
+- malayalam
+- asr
+- speech-recognition
+- on-device
+base_model: thennal/whisper-medium-ml
+model-index:
+- name: Whisper Medium Malayalam GGML
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: Common Voice 11.0
+      type: mozilla-foundation/common_voice_11_0
+      config: ml
+      split: test
+    metrics:
+    - type: wer
+      value: 11.49
+      name: WER (with normalization)
+    - type: wer
+      value: 38.62
+      name: WER (without normalization)
+    - type: cer
+      value: 7.33
+      name: CER
+---
+# Whisper Medium Malayalam - GGML Format
+This is a GGML-converted version of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) optimized for use with [whisper.cpp](https://github.com/ggml-org/whisper.cpp).
+**Key Features:**
+- 🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases
+- 📱 Optimized for on-device, offline inference
+- ⚡ Up to 85% size reduction with quantization
+- 🎯 Malayalam language specialization
+- 💻 Cross-platform support (CPU, Metal, CUDA, etc.)
+## Model Details
+- **Base Model:** OpenAI Whisper Medium
+- **Language:** Malayalam
+- **Task:** Automatic Speech Recognition (ASR)
+- **Format:** GGML (converted from PyTorch)
+- **Source:** Fine-tuned on Common Voice 11.0 dataset
+## Available Model Variants
+This repository provides multiple quantized versions optimized for different use cases:
+| Model | Size | Use Case | Quality |
+|-------|------|----------|---------|
+| `ggml-model.bin` | 1.4 GB | Original conversion (F16) | Highest quality |
+| `ggml-model-q8_0.bin` | 785 MB | High quality, smaller size | Very high quality |
+| `ggml-model-q5_0.bin` | 514 MB | **Recommended** - Balanced quality/size | Good quality |
+| `ggml-model-q4_0.bin` | 424 MB | Smallest size, faster inference | Acceptable quality |
+**Recommendation:** For most users, `ggml-model-q5_0.bin` offers the best balance between quality and file size.
+## Performance (from source model)
+- **Word Error Rate (WER):** 38.62% (without normalization)
+- **Character Error Rate (CER):** 7.33%
+- **WER with normalization:** 11.49%
+Note: Whisper's normalization has significant issues for Malayalam language.
+## Usage with whisper.cpp
+### Prerequisites
+```bash
+git clone https://github.com/ggml-org/whisper.cpp
+cd whisper.cpp
+make
+```
+### Download the model
+Download one of the model files from this repository and place it in the `models` directory of whisper.cpp:
+- **Recommended:** `ggml-model-q5_0.bin` (514 MB)
+- **Smallest:** `ggml-model-q4_0.bin` (424 MB)
+- **Highest quality:** `ggml-model-q8_0.bin` (785 MB)
+- **Original:** `ggml-model.bin` (1.4 GB)
+### Run inference
+```bash
+# Using the recommended Q5_0 model
+./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml
+# Or using any other variant
+./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml
+```
+Where:
+- `-m` specifies the model file
+- `-f` specifies the input audio file (must be 16-bit WAV)
+- `-l ml` sets the language to Malayalam
+### Additional options
+```bash
+# Translate to English
+./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr
+# Output in different formats
+./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt  # SubRip subtitles
+./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt  # WebVTT subtitles
+./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt  # Plain text
+```
+## Conversion Details
+This model was converted from the HuggingFace transformers format to GGML using the `convert-h5-to-ggml.py` script from whisper.cpp.
+### Quantization Details
+The quantized models were created using whisper.cpp's quantization tool:
+- **Q8_0**: 8-bit quantization, retains ~99% of original quality
+- **Q5_0**: 5-bit quantization, excellent quality/size balance (~73% size reduction)
+- **Q4_0**: 4-bit quantization, maximum compression (~85% size reduction)
+All quantized models maintain the full model architecture and can be used as drop-in replacements.
+## Training Data
+The source model was fine-tuned on multiple Malayalam speech datasets:
+- [Mozilla Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (ml)
+- [Google FLEURS](https://huggingface.co/datasets/google/fleurs)
+- [IMaSC](https://huggingface.co/datasets/thennal/IMaSC)
+- [ULCA Malayalam](https://huggingface.co/datasets/thennal/ulca_ml)
+- [MSC](https://huggingface.co/datasets/thennal/msc)
+- [Indic TTS Malayalam](https://huggingface.co/datasets/thennal/indic_tts_ml)
+## Citation
+If you use this model, please cite:
+```bibtex
+@misc{whisper-medium-ml-ggml,
+  author = {Thennal D K},
+  title = {Whisper Medium Malayalam - GGML Format},
+  year = {2024},
+  publisher = {HuggingFace},
+  journal = {HuggingFace Model Hub},
+  howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
+  note = {GGML conversion with quantization}
+}
+@misc{radford2022whisper,
+  title={Robust Speech Recognition via Large-Scale Weak Supervision},
+  author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
+  year={2022},
+  eprint={2212.04356},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS}
+}
+```
+## License
+Apache 2.0 - Same as the original Whisper model and fine-tuned version.
+## Acknowledgments
+This model builds upon the work of many contributors:
+### Original Model & Framework
+- **OpenAI Whisper Team** - For the groundbreaking Whisper ASR model ([paper](https://arxiv.org/abs/2212.04356), [code](https://github.com/openai/whisper))
+- **Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever** - Whisper authors
+### Malayalam Fine-tuning
+- **[Thennal D K](https://huggingface.co/thennal)** - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
+- **Original model:** [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml)
+- Training resources: [Fine-tuning Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb)
+### Datasets
+- **Mozilla Foundation** - Common Voice 11.0 Malayalam dataset
+- **Google** - FLEURS multilingual dataset
+- **Community contributors** - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets
+### GGML Implementation
+- **[whisper.cpp team](https://github.com/ggml-org/whisper.cpp)** - For the efficient C/C++ implementation and GGML format
+- **[ggml-org](https://github.com/ggml-org)** - For the GGML machine learning library
+### Tools & Frameworks
+- **HuggingFace Transformers** - Model training and inference framework
+- **PyTorch** - Deep learning framework
+---
+**Special Thanks:** This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.

UPLOAD_INSTRUCTIONS.md ADDED Viewed

	@@ -0,0 +1,112 @@

+# Upload Instructions for HuggingFace
+## Model Files Ready
+Your converted GGML models are ready at:
+```
+/Users/sujith/work/2025/expiriments/whisper-ml/ggml-output/
+Files to upload:
+- ggml-model.bin (1.4 GB) - Original F16 conversion
+- ggml-model-q8_0.bin (785 MB) - High quality quantized
+- ggml-model-q5_0.bin (514 MB) - Recommended balanced version
+- ggml-model-q4_0.bin (424 MB) - Smallest quantized version
+- README.md - Complete documentation
+```
+## Option 1: Manual Upload via Web Interface
+1. **Create Repository:**
+   - Visit https://huggingface.co/new
+   - Repository name: `whisper-medium-ml-ggml`
+   - Repository type: Model
+   - License: Same as source (Apache 2.0 or appropriate)
+   - Click "Create repository"
+2. **Upload Files:**
+   - Click "Add file" → "Upload files"
+   - Upload all files:
+     - `README.md` (documentation)
+     - `ggml-model-q4_0.bin` (smallest, 424 MB)
+     - `ggml-model-q5_0.bin` (recommended, 514 MB)
+     - `ggml-model-q8_0.bin` (high quality, 785 MB)
+     - `ggml-model.bin` (original, 1.4 GB) - Optional, since quantized versions are available
+## Option 2: Upload via Git (Recommended for large files)
+1. **Create Repository on HuggingFace:**
+   - Visit https://huggingface.co/new and create `whisper-medium-ml-ggml`
+2. **Clone and Upload:**
+   ```bash
+   cd /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output
+   # Initialize git
+   git init
+   git lfs install
+   # Track large files with LFS
+   git lfs track "*.bin"
+   # Add remote
+   git remote add origin https://huggingface.co/sujithatz/whisper-medium-ml-ggml
+   # Add files
+   git add .
+   git commit -m "Initial commit: GGML converted Whisper Malayalam model"
+   # Push
+   git push -u origin main
+   ```
+## Option 3: Upload via HuggingFace Hub (requires write token)
+1. **Get Write Token:**
+   - Visit https://huggingface.co/settings/tokens
+   - Create new token with "write" access
+   - Copy the token
+2. **Login:**
+   ```bash
+   huggingface-cli login
+   # Paste your write token when prompted
+   ```
+3. **Create and Upload:**
+   ```bash
+   huggingface-cli repo create whisper-medium-ml-ggml --type model
+   cd /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output
+   huggingface-cli upload sujithatz/whisper-medium-ml-ggml . .
+   ```
+## Using the Model Locally (No upload needed)
+If you want to test the model first:
+```bash
+# Build whisper.cpp
+cd /Users/sujith/work/2025/expiriments/whisper-ml/whisper.cpp
+make
+# Copy model
+cp ../ggml-output/ggml-model.bin models/ggml-medium-ml.bin
+# Run inference
+./main -m models/ggml-medium-ml.bin -f /path/to/malayalam-audio.wav -l ml
+```
+## Files Ready to Upload
+- `ggml-model.bin` (1.4 GB) - Original F16 conversion
+- `ggml-model-q8_0.bin` (785 MB) - High quality quantized
+- `ggml-model-q5_0.bin` (514 MB) - **Recommended** - Best balance
+- `ggml-model-q4_0.bin` (424 MB) - Smallest, fastest inference
+- `README.md` - Complete documentation with all variants
+- `.gitattributes` - (will be created) LFS configuration
+**Total size if uploading all variants:** ~2.5 GB
+**Recommended minimum:** Upload at least Q5_0 and Q4_0 variants (938 MB total)
+## Next Steps
+Choose one of the upload options above based on your preference. The model is fully converted and ready to use!