sujithatz commited on
Commit
c2fbeba
·
verified ·
1 Parent(s): 40f6f93

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +205 -0
  2. UPLOAD_INSTRUCTIONS.md +112 -0
README.md ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ml
4
+ license: apache-2.0
5
+ tags:
6
+ - whisper
7
+ - whisper.cpp
8
+ - ggml
9
+ - quantized
10
+ - malayalam
11
+ - asr
12
+ - speech-recognition
13
+ - on-device
14
+ base_model: thennal/whisper-medium-ml
15
+ model-index:
16
+ - name: Whisper Medium Malayalam GGML
17
+ results:
18
+ - task:
19
+ type: automatic-speech-recognition
20
+ name: Automatic Speech Recognition
21
+ dataset:
22
+ name: Common Voice 11.0
23
+ type: mozilla-foundation/common_voice_11_0
24
+ config: ml
25
+ split: test
26
+ metrics:
27
+ - type: wer
28
+ value: 11.49
29
+ name: WER (with normalization)
30
+ - type: wer
31
+ value: 38.62
32
+ name: WER (without normalization)
33
+ - type: cer
34
+ value: 7.33
35
+ name: CER
36
+ ---
37
+
38
+ # Whisper Medium Malayalam - GGML Format
39
+
40
+ This is a GGML-converted version of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) optimized for use with [whisper.cpp](https://github.com/ggml-org/whisper.cpp).
41
+
42
+ **Key Features:**
43
+ - 🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases
44
+ - 📱 Optimized for on-device, offline inference
45
+ - ⚡ Up to 85% size reduction with quantization
46
+ - 🎯 Malayalam language specialization
47
+ - 💻 Cross-platform support (CPU, Metal, CUDA, etc.)
48
+
49
+ ## Model Details
50
+
51
+ - **Base Model:** OpenAI Whisper Medium
52
+ - **Language:** Malayalam
53
+ - **Task:** Automatic Speech Recognition (ASR)
54
+ - **Format:** GGML (converted from PyTorch)
55
+ - **Source:** Fine-tuned on Common Voice 11.0 dataset
56
+
57
+ ## Available Model Variants
58
+
59
+ This repository provides multiple quantized versions optimized for different use cases:
60
+
61
+ | Model | Size | Use Case | Quality |
62
+ |-------|------|----------|---------|
63
+ | `ggml-model.bin` | 1.4 GB | Original conversion (F16) | Highest quality |
64
+ | `ggml-model-q8_0.bin` | 785 MB | High quality, smaller size | Very high quality |
65
+ | `ggml-model-q5_0.bin` | 514 MB | **Recommended** - Balanced quality/size | Good quality |
66
+ | `ggml-model-q4_0.bin` | 424 MB | Smallest size, faster inference | Acceptable quality |
67
+
68
+ **Recommendation:** For most users, `ggml-model-q5_0.bin` offers the best balance between quality and file size.
69
+
70
+ ## Performance (from source model)
71
+
72
+ - **Word Error Rate (WER):** 38.62% (without normalization)
73
+ - **Character Error Rate (CER):** 7.33%
74
+ - **WER with normalization:** 11.49%
75
+
76
+ Note: Whisper's normalization has significant issues for Malayalam language.
77
+
78
+ ## Usage with whisper.cpp
79
+
80
+ ### Prerequisites
81
+
82
+ ```bash
83
+ git clone https://github.com/ggml-org/whisper.cpp
84
+ cd whisper.cpp
85
+ make
86
+ ```
87
+
88
+ ### Download the model
89
+
90
+ Download one of the model files from this repository and place it in the `models` directory of whisper.cpp:
91
+
92
+ - **Recommended:** `ggml-model-q5_0.bin` (514 MB)
93
+ - **Smallest:** `ggml-model-q4_0.bin` (424 MB)
94
+ - **Highest quality:** `ggml-model-q8_0.bin` (785 MB)
95
+ - **Original:** `ggml-model.bin` (1.4 GB)
96
+
97
+ ### Run inference
98
+
99
+ ```bash
100
+ # Using the recommended Q5_0 model
101
+ ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml
102
+
103
+ # Or using any other variant
104
+ ./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml
105
+ ```
106
+
107
+ Where:
108
+ - `-m` specifies the model file
109
+ - `-f` specifies the input audio file (must be 16-bit WAV)
110
+ - `-l ml` sets the language to Malayalam
111
+
112
+ ### Additional options
113
+
114
+ ```bash
115
+ # Translate to English
116
+ ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr
117
+
118
+ # Output in different formats
119
+ ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt # SubRip subtitles
120
+ ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt # WebVTT subtitles
121
+ ./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt # Plain text
122
+ ```
123
+
124
+ ## Conversion Details
125
+
126
+ This model was converted from the HuggingFace transformers format to GGML using the `convert-h5-to-ggml.py` script from whisper.cpp.
127
+
128
+ ### Quantization Details
129
+
130
+ The quantized models were created using whisper.cpp's quantization tool:
131
+
132
+ - **Q8_0**: 8-bit quantization, retains ~99% of original quality
133
+ - **Q5_0**: 5-bit quantization, excellent quality/size balance (~73% size reduction)
134
+ - **Q4_0**: 4-bit quantization, maximum compression (~85% size reduction)
135
+
136
+ All quantized models maintain the full model architecture and can be used as drop-in replacements.
137
+
138
+ ## Training Data
139
+
140
+ The source model was fine-tuned on multiple Malayalam speech datasets:
141
+ - [Mozilla Common Voice 11.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) (ml)
142
+ - [Google FLEURS](https://huggingface.co/datasets/google/fleurs)
143
+ - [IMaSC](https://huggingface.co/datasets/thennal/IMaSC)
144
+ - [ULCA Malayalam](https://huggingface.co/datasets/thennal/ulca_ml)
145
+ - [MSC](https://huggingface.co/datasets/thennal/msc)
146
+ - [Indic TTS Malayalam](https://huggingface.co/datasets/thennal/indic_tts_ml)
147
+
148
+ ## Citation
149
+
150
+ If you use this model, please cite:
151
+
152
+ ```bibtex
153
+ @misc{whisper-medium-ml-ggml,
154
+ author = {Thennal D K},
155
+ title = {Whisper Medium Malayalam - GGML Format},
156
+ year = {2024},
157
+ publisher = {HuggingFace},
158
+ journal = {HuggingFace Model Hub},
159
+ howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
160
+ note = {GGML conversion with quantization}
161
+ }
162
+
163
+ @misc{radford2022whisper,
164
+ title={Robust Speech Recognition via Large-Scale Weak Supervision},
165
+ author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
166
+ year={2022},
167
+ eprint={2212.04356},
168
+ archivePrefix={arXiv},
169
+ primaryClass={eess.AS}
170
+ }
171
+ ```
172
+
173
+ ## License
174
+
175
+ Apache 2.0 - Same as the original Whisper model and fine-tuned version.
176
+
177
+ ## Acknowledgments
178
+
179
+ This model builds upon the work of many contributors:
180
+
181
+ ### Original Model & Framework
182
+ - **OpenAI Whisper Team** - For the groundbreaking Whisper ASR model ([paper](https://arxiv.org/abs/2212.04356), [code](https://github.com/openai/whisper))
183
+ - **Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever** - Whisper authors
184
+
185
+ ### Malayalam Fine-tuning
186
+ - **[Thennal D K](https://huggingface.co/thennal)** - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
187
+ - **Original model:** [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml)
188
+ - Training resources: [Fine-tuning Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/fine_tune_whisper.ipynb)
189
+
190
+ ### Datasets
191
+ - **Mozilla Foundation** - Common Voice 11.0 Malayalam dataset
192
+ - **Google** - FLEURS multilingual dataset
193
+ - **Community contributors** - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets
194
+
195
+ ### GGML Implementation
196
+ - **[whisper.cpp team](https://github.com/ggml-org/whisper.cpp)** - For the efficient C/C++ implementation and GGML format
197
+ - **[ggml-org](https://github.com/ggml-org)** - For the GGML machine learning library
198
+
199
+ ### Tools & Frameworks
200
+ - **HuggingFace Transformers** - Model training and inference framework
201
+ - **PyTorch** - Deep learning framework
202
+
203
+ ---
204
+
205
+ **Special Thanks:** This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.
UPLOAD_INSTRUCTIONS.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Upload Instructions for HuggingFace
2
+
3
+ ## Model Files Ready
4
+
5
+ Your converted GGML models are ready at:
6
+ ```
7
+ /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output/
8
+
9
+ Files to upload:
10
+ - ggml-model.bin (1.4 GB) - Original F16 conversion
11
+ - ggml-model-q8_0.bin (785 MB) - High quality quantized
12
+ - ggml-model-q5_0.bin (514 MB) - Recommended balanced version
13
+ - ggml-model-q4_0.bin (424 MB) - Smallest quantized version
14
+ - README.md - Complete documentation
15
+ ```
16
+
17
+ ## Option 1: Manual Upload via Web Interface
18
+
19
+ 1. **Create Repository:**
20
+ - Visit https://huggingface.co/new
21
+ - Repository name: `whisper-medium-ml-ggml`
22
+ - Repository type: Model
23
+ - License: Same as source (Apache 2.0 or appropriate)
24
+ - Click "Create repository"
25
+
26
+ 2. **Upload Files:**
27
+ - Click "Add file" → "Upload files"
28
+ - Upload all files:
29
+ - `README.md` (documentation)
30
+ - `ggml-model-q4_0.bin` (smallest, 424 MB)
31
+ - `ggml-model-q5_0.bin` (recommended, 514 MB)
32
+ - `ggml-model-q8_0.bin` (high quality, 785 MB)
33
+ - `ggml-model.bin` (original, 1.4 GB) - Optional, since quantized versions are available
34
+
35
+ ## Option 2: Upload via Git (Recommended for large files)
36
+
37
+ 1. **Create Repository on HuggingFace:**
38
+ - Visit https://huggingface.co/new and create `whisper-medium-ml-ggml`
39
+
40
+ 2. **Clone and Upload:**
41
+ ```bash
42
+ cd /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output
43
+
44
+ # Initialize git
45
+ git init
46
+ git lfs install
47
+
48
+ # Track large files with LFS
49
+ git lfs track "*.bin"
50
+
51
+ # Add remote
52
+ git remote add origin https://huggingface.co/sujithatz/whisper-medium-ml-ggml
53
+
54
+ # Add files
55
+ git add .
56
+ git commit -m "Initial commit: GGML converted Whisper Malayalam model"
57
+
58
+ # Push
59
+ git push -u origin main
60
+ ```
61
+
62
+ ## Option 3: Upload via HuggingFace Hub (requires write token)
63
+
64
+ 1. **Get Write Token:**
65
+ - Visit https://huggingface.co/settings/tokens
66
+ - Create new token with "write" access
67
+ - Copy the token
68
+
69
+ 2. **Login:**
70
+ ```bash
71
+ huggingface-cli login
72
+ # Paste your write token when prompted
73
+ ```
74
+
75
+ 3. **Create and Upload:**
76
+ ```bash
77
+ huggingface-cli repo create whisper-medium-ml-ggml --type model
78
+ cd /Users/sujith/work/2025/expiriments/whisper-ml/ggml-output
79
+ huggingface-cli upload sujithatz/whisper-medium-ml-ggml . .
80
+ ```
81
+
82
+ ## Using the Model Locally (No upload needed)
83
+
84
+ If you want to test the model first:
85
+
86
+ ```bash
87
+ # Build whisper.cpp
88
+ cd /Users/sujith/work/2025/expiriments/whisper-ml/whisper.cpp
89
+ make
90
+
91
+ # Copy model
92
+ cp ../ggml-output/ggml-model.bin models/ggml-medium-ml.bin
93
+
94
+ # Run inference
95
+ ./main -m models/ggml-medium-ml.bin -f /path/to/malayalam-audio.wav -l ml
96
+ ```
97
+
98
+ ## Files Ready to Upload
99
+
100
+ - `ggml-model.bin` (1.4 GB) - Original F16 conversion
101
+ - `ggml-model-q8_0.bin` (785 MB) - High quality quantized
102
+ - `ggml-model-q5_0.bin` (514 MB) - **Recommended** - Best balance
103
+ - `ggml-model-q4_0.bin` (424 MB) - Smallest, fastest inference
104
+ - `README.md` - Complete documentation with all variants
105
+ - `.gitattributes` - (will be created) LFS configuration
106
+
107
+ **Total size if uploading all variants:** ~2.5 GB
108
+ **Recommended minimum:** Upload at least Q5_0 and Q4_0 variants (938 MB total)
109
+
110
+ ## Next Steps
111
+
112
+ Choose one of the upload options above based on your preference. The model is fully converted and ready to use!