Upload 2 files
Browse files- README.md +205 -0
- UPLOAD_INSTRUCTIONS.md +112 -0
README.md
ADDED
|
@@ -0,0 +1,205 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- ml
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- whisper
|
| 7 |
+
- whisper.cpp
|
| 8 |
+
- ggml
|
| 9 |
+
- quantized
|
| 10 |
+
- malayalam
|
| 11 |
+
- asr
|
| 12 |
+
- speech-recognition
|
| 13 |
+
- on-device
|
| 14 |
+
base_model: thennal/whisper-medium-ml
|
| 15 |
+
model-index:
|
| 16 |
+
- name: Whisper Medium Malayalam GGML
|
| 17 |
+
results:
|
| 18 |
+
- task:
|
| 19 |
+
type: automatic-speech-recognition
|
| 20 |
+
name: Automatic Speech Recognition
|
| 21 |
+
dataset:
|
| 22 |
+
name: Common Voice 11.0
|
| 23 |
+
type: mozilla-foundation/common_voice_11_0
|
| 24 |
+
config: ml
|
| 25 |
+
split: test
|
| 26 |
+
metrics:
|
| 27 |
+
- type: wer
|
| 28 |
+
value: 11.49
|
| 29 |
+
name: WER (with normalization)
|
| 30 |
+
- type: wer
|
| 31 |
+
value: 38.62
|
| 32 |
+
name: WER (without normalization)
|
| 33 |
+
- type: cer
|
| 34 |
+
value: 7.33
|
| 35 |
+
name: CER
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
# Whisper Medium Malayalam - GGML Format
|
| 39 |
+
|
| 40 |
+
This is a GGML-converted version of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) optimized for use with [whisper.cpp](https://github.com/ggml-org/whisper.cpp).
|
| 41 |
+
|
| 42 |
+
**Key Features:**
|
| 43 |
+
- 🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases
|
| 44 |
+
- 📱 Optimized for on-device, offline inference
|
| 45 |
+
- ⚡ Up to 85% size reduction with quantization
|
| 46 |
+
- 🎯 Malayalam language specialization
|
| 47 |
+
- 💻 Cross-platform support (CPU, Metal, CUDA, etc.)
|
| 48 |
+
|
| 49 |
+
## Model Details
|
| 50 |
+
|
| 51 |
+
- **Base Model:** OpenAI Whisper Medium
|
| 52 |
+
- **Language:** Malayalam
|
| 53 |
+
- **Task:** Automatic Speech Recognition (ASR)
|
| 54 |
+
- **Format:** GGML (converted from PyTorch)
|
| 55 |
+
- **Source:** Fine-tuned on Common Voice 11.0 dataset
|
| 56 |
+
|
| 57 |
+
## Available Model Variants
|
| 58 |
+
|
| 59 |
+
This repository provides multiple quantized versions optimized for different use cases:
|
| 60 |
+
|
| 61 |
+
| Model | Size | Use Case | Quality |
|
| 62 |
+
|-------|------|----------|---------|
|
| 63 |
+
| `ggml-model.bin` | 1.4 GB | Original conversion (F16) | Highest quality |
|
| 64 |
+
| `ggml-model-q8_0.bin` | 785 MB | High quality, smaller size | Very high quality |
|
| 65 |
+
| `ggml-model-q5_0.bin` | 514 MB | **Recommended** - Balanced quality/size | Good quality |
|
| 66 |
+
| `ggml-model-q4_0.bin` | 424 MB | Smallest size, faster inference | Acceptable quality |
|
| 67 |
+
|
| 68 |
+
**Recommendation:** For most users, `ggml-model-q5_0.bin` offers the best balance between quality and file size.
|
| 69 |
+
|
| 70 |
+
## Performance (from source model)
|
| 71 |
+
|
| 72 |
+
- **Word Error Rate (WER):** 38.62% (without normalization)
|
| 73 |
+
- **Character Error Rate (CER):** 7.33%
|
| 74 |
+
- **WER with normalization:** 11.49%
|
| 75 |
+
|
| 76 |
+
Note: Whisper's normalization has significant issues for Malayalam language.
|
| 77 |
+
|
| 78 |
+
## Usage with whisper.cpp
|
| 79 |
+
|
| 80 |
+
### Prerequisites
|
| 81 |
+
|
| 82 |
+
```bash
|
| 83 |
+
git clone https://github.com/ggml-org/whisper.cpp
|
| 84 |
+
cd whisper.cpp
|
| 85 |
+
make
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
### Download the model
|
| 89 |
+
|
| 90 |
+
Download one of the model files from this repository and place it in the `models` directory of whisper.cpp:
|
| 91 |
+
|
| 92 |
+
- **Recommended:** `ggml-model-q5_0.bin` (514 MB)
|
| 93 |
+
- **Smallest:** `ggml-model-q4_0.bin` (424 MB)
|
| 94 |
+
- **Highest quality:** `ggml-model-q8_0.bin` (785 MB)
|
| 95 |
+
- **Original:** `ggml-model.bin` (1.4 GB)
|
| 96 |
+
|
| 97 |
+
### Run inference
|
| 98 |
+
|
| 99 |
+
```bash
|
| 100 |
+
# Using the recommended Q5_0 model
|
| 101 |
+
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml
|
| 102 |
+
|
| 103 |
+
# Or using any other variant
|
| 104 |
+
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
Where:
|
| 108 |
+
- `-m` specifies the model file
|
| 109 |
+
- `-f` specifies the input audio file (must be 16-bit WAV)
|
| 110 |
+
- `-l ml` sets the language to Malayalam
|
| 111 |
+
|
| 112 |
+
### Additional options
|
| 113 |
+
|
| 114 |
+
```bash
|
| 115 |
+
# Translate to English
|
| 116 |
+
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr
|
| 117 |
+
|
| 118 |
+
# Output in different formats
|
| 119 |
+
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt # SubRip subtitles
|
| 120 |
+
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt # WebVTT subtitles
|
| 121 |
+
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt # Plain text
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
## Conversion Details
|
| 125 |
+
|
| 126 |
+
This model was converted from the HuggingFace transformers format to GGML using the `convert-h5-to-ggml.py` script from whisper.cpp.
|
| 127 |
+
|
| 128 |
+
### Quantization Details
|
| 129 |
+
|
| 130 |
+
The quantized models were created using whisper.cpp's quantization tool:
|
| 131 |
+
|
| 132 |
+
- **Q8_0**: 8-bit quantization, retains ~99% of original quality
|
| 133 |
+
- **Q5_0**: 5-bit quantization, excellent quality/size balance (~73% size reduction)
|
| 134 |
+
- **Q4_0**: 4-bit quantization, maximum compression (~85% size reduction)
|
| 135 |
+
|
| 136 |
+
All quantized models maintain the full model architecture and can be used as drop-in replacements.
|
| 137 |
+
|
| 138 |
+
## Training Data
|
| 139 |
+
|
| 140 |
+
The source model was fine-tuned on multiple Malayalam speech datasets:
|
| 141 |
+
- [Mozilla Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (ml)
|
| 142 |
+
- [Google FLEURS](https://huggingface.co/datasets/google/fleurs)
|
| 143 |
+
- [IMaSC](https://huggingface.co/datasets/thennal/IMaSC)
|
| 144 |
+
- [ULCA Malayalam](https://huggingface.co/datasets/thennal/ulca_ml)
|
| 145 |
+
- [MSC](https://huggingface.co/datasets/thennal/msc)
|
| 146 |
+
- [Indic TTS Malayalam](https://huggingface.co/datasets/thennal/indic_tts_ml)
|
| 147 |
+
|
| 148 |
+
## Citation
|
| 149 |
+
|
| 150 |
+
If you use this model, please cite:
|
| 151 |
+
|
| 152 |
+
```bibtex
|
| 153 |
+
@misc{whisper-medium-ml-ggml,
|
| 154 |
+
author = {Thennal D K},
|
| 155 |
+
title = {Whisper Medium Malayalam - GGML Format},
|
| 156 |
+
year = {2024},
|
| 157 |
+
publisher = {HuggingFace},
|
| 158 |
+
journal = {HuggingFace Model Hub},
|
| 159 |
+
howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
|
| 160 |
+
note = {GGML conversion with quantization}
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
@misc{radford2022whisper,
|
| 164 |
+
title={Robust Speech Recognition via Large-Scale Weak Supervision},
|
| 165 |
+
author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
|
| 166 |
+
year={2022},
|
| 167 |
+
eprint={2212.04356},
|
| 168 |
+
archivePrefix={arXiv},
|
| 169 |
+
primaryClass={eess.AS}
|
| 170 |
+
}
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
## License
|
| 174 |
+
|
| 175 |
+
Apache 2.0 - Same as the original Whisper model and fine-tuned version.
|
| 176 |
+
|
| 177 |
+
## Acknowledgments
|
| 178 |
+
|
| 179 |
+
This model builds upon the work of many contributors:
|
| 180 |
+
|
| 181 |
+
### Original Model & Framework
|
| 182 |
+
- **OpenAI Whisper Team** - For the groundbreaking Whisper ASR model ([paper](https://arxiv.org/abs/2212.04356), [code](https://github.com/openai/whisper))
|
| 183 |
+
- **Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever** - Whisper authors
|
| 184 |
+
|
| 185 |
+
### Malayalam Fine-tuning
|
| 186 |
+
- **[Thennal D K](https://huggingface.co/thennal)** - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
|
| 187 |
+
- **Original model:** [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml)
|
| 188 |
+
- Training resources: [Fine-tuning Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb)
|
| 189 |
+
|
| 190 |
+
### Datasets
|
| 191 |
+
- **Mozilla Foundation** - Common Voice 11.0 Malayalam dataset
|
| 192 |
+
- **Google** - FLEURS multilingual dataset
|
| 193 |
+
- **Community contributors** - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets
|
| 194 |
+
|
| 195 |
+
### GGML Implementation
|
| 196 |
+
- **[whisper.cpp team](https://github.com/ggml-org/whisper.cpp)** - For the efficient C/C++ implementation and GGML format
|
| 197 |
+
- **[ggml-org](https://github.com/ggml-org)** - For the GGML machine learning library
|
| 198 |
+
|
| 199 |
+
### Tools & Frameworks
|
| 200 |
+
- **HuggingFace Transformers** - Model training and inference framework
|
| 201 |
+
- **PyTorch** - Deep learning framework
|
| 202 |
+
|
| 203 |
+
---
|
| 204 |
+
|
| 205 |
+
**Special Thanks:** This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.
|
UPLOAD_INSTRUCTIONS.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Upload Instructions for HuggingFace
|
| 2 |
+
|
| 3 |
+
## Model Files Ready
|
| 4 |
+
|
| 5 |
+
Your converted GGML models are ready at:
|
| 6 |
+
```
|
| 7 |
+
/Users/sujith/work/2025/expiriments/whisper-ml/ggml-output/
|
| 8 |
+
|
| 9 |
+
Files to upload:
|
| 10 |
+
- ggml-model.bin (1.4 GB) - Original F16 conversion
|
| 11 |
+
- ggml-model-q8_0.bin (785 MB) - High quality quantized
|
| 12 |
+
- ggml-model-q5_0.bin (514 MB) - Recommended balanced version
|
| 13 |
+
- ggml-model-q4_0.bin (424 MB) - Smallest quantized version
|
| 14 |
+
- README.md - Complete documentation
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
## Option 1: Manual Upload via Web Interface
|
| 18 |
+
|
| 19 |
+
1. **Create Repository:**
|
| 20 |
+
- Visit https://huggingface.co/new
|
| 21 |
+
- Repository name: `whisper-medium-ml-ggml`
|
| 22 |
+
- Repository type: Model
|
| 23 |
+
- License: Same as source (Apache 2.0 or appropriate)
|
| 24 |
+
- Click "Create repository"
|
| 25 |
+
|
| 26 |
+
2. **Upload Files:**
|
| 27 |
+
- Click "Add file" → "Upload files"
|
| 28 |
+
- Upload all files:
|
| 29 |
+
- `README.md` (documentation)
|
| 30 |
+
- `ggml-model-q4_0.bin` (smallest, 424 MB)
|
| 31 |
+
- `ggml-model-q5_0.bin` (recommended, 514 MB)
|
| 32 |
+
- `ggml-model-q8_0.bin` (high quality, 785 MB)
|
| 33 |
+
- `ggml-model.bin` (original, 1.4 GB) - Optional, since quantized versions are available
|
| 34 |
+
|
| 35 |
+
## Option 2: Upload via Git (Recommended for large files)
|
| 36 |
+
|
| 37 |
+
1. **Create Repository on HuggingFace:**
|
| 38 |
+
- Visit https://huggingface.co/new and create `whisper-medium-ml-ggml`
|
| 39 |
+
|
| 40 |
+
2. **Clone and Upload:**
|
| 41 |
+
```bash
|
| 42 |
+
cd /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output
|
| 43 |
+
|
| 44 |
+
# Initialize git
|
| 45 |
+
git init
|
| 46 |
+
git lfs install
|
| 47 |
+
|
| 48 |
+
# Track large files with LFS
|
| 49 |
+
git lfs track "*.bin"
|
| 50 |
+
|
| 51 |
+
# Add remote
|
| 52 |
+
git remote add origin https://huggingface.co/sujithatz/whisper-medium-ml-ggml
|
| 53 |
+
|
| 54 |
+
# Add files
|
| 55 |
+
git add .
|
| 56 |
+
git commit -m "Initial commit: GGML converted Whisper Malayalam model"
|
| 57 |
+
|
| 58 |
+
# Push
|
| 59 |
+
git push -u origin main
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
## Option 3: Upload via HuggingFace Hub (requires write token)
|
| 63 |
+
|
| 64 |
+
1. **Get Write Token:**
|
| 65 |
+
- Visit https://huggingface.co/settings/tokens
|
| 66 |
+
- Create new token with "write" access
|
| 67 |
+
- Copy the token
|
| 68 |
+
|
| 69 |
+
2. **Login:**
|
| 70 |
+
```bash
|
| 71 |
+
huggingface-cli login
|
| 72 |
+
# Paste your write token when prompted
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
3. **Create and Upload:**
|
| 76 |
+
```bash
|
| 77 |
+
huggingface-cli repo create whisper-medium-ml-ggml --type model
|
| 78 |
+
cd /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output
|
| 79 |
+
huggingface-cli upload sujithatz/whisper-medium-ml-ggml . .
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Using the Model Locally (No upload needed)
|
| 83 |
+
|
| 84 |
+
If you want to test the model first:
|
| 85 |
+
|
| 86 |
+
```bash
|
| 87 |
+
# Build whisper.cpp
|
| 88 |
+
cd /Users/sujith/work/2025/expiriments/whisper-ml/whisper.cpp
|
| 89 |
+
make
|
| 90 |
+
|
| 91 |
+
# Copy model
|
| 92 |
+
cp ../ggml-output/ggml-model.bin models/ggml-medium-ml.bin
|
| 93 |
+
|
| 94 |
+
# Run inference
|
| 95 |
+
./main -m models/ggml-medium-ml.bin -f /path/to/malayalam-audio.wav -l ml
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Files Ready to Upload
|
| 99 |
+
|
| 100 |
+
- `ggml-model.bin` (1.4 GB) - Original F16 conversion
|
| 101 |
+
- `ggml-model-q8_0.bin` (785 MB) - High quality quantized
|
| 102 |
+
- `ggml-model-q5_0.bin` (514 MB) - **Recommended** - Best balance
|
| 103 |
+
- `ggml-model-q4_0.bin` (424 MB) - Smallest, fastest inference
|
| 104 |
+
- `README.md` - Complete documentation with all variants
|
| 105 |
+
- `.gitattributes` - (will be created) LFS configuration
|
| 106 |
+
|
| 107 |
+
**Total size if uploading all variants:** ~2.5 GB
|
| 108 |
+
**Recommended minimum:** Upload at least Q5_0 and Q4_0 variants (938 MB total)
|
| 109 |
+
|
| 110 |
+
## Next Steps
|
| 111 |
+
|
| 112 |
+
Choose one of the upload options above based on your preference. The model is fully converted and ready to use!
|