Parakeet TDT Japanese CoreML Models
CoreML conversion of nvidia/parakeet-tdt_ctc-0.6b-ja for on-device Japanese speech recognition on Apple platforms (iOS/macOS).
Model Description
This is a CoreML conversion of NVIDIA's Parakeet TDT 0.6B Japanese model, a state-of-the-art automatic speech recognition (ASR) model based on the FastConformer-TDT (Token-and-Duration Transducer) architecture.
Key Features
- Language: Japanese (ja)
- Architecture: Hybrid FastConformer-TDT-CTC
- Vocabulary Size: 3,072 tokens (SentencePiece BPE)
- Sample Rate: 16 kHz
- Fixed Audio Window: 15 seconds (240,000 samples)
Model Components
| Component | File | Input Shape | Output Shape |
|---|---|---|---|
| Preprocessor | preprocessor.mlpackage |
[1, 240000] |
[1, 80, T] |
| Encoder | encoder.mlpackage |
[1, 80, 1501] |
[1, 1024, 188] |
| Decoder | decoder.mlpackage |
[1, 1] + LSTM states |
[1, 640, 2] + states |
| Joint | joint.mlpackage |
encoder + decoder outputs | [1, 188, 1, 3078] |
| Mel+Encoder (Fused) | mel_encoder.mlpackage |
[1, 240000] |
[1, 1024, 188] |
| Vocabulary | vocab_ja.json |
- | 3,072 tokens |
Usage
Swift Example
import CoreML
// Load models
let encoder = try MLModel(contentsOf: encoderURL)
let decoder = try MLModel(contentsOf: decoderURL)
let joint = try MLModel(contentsOf: jointURL)
// Or use the fused mel_encoder for simpler pipeline
let melEncoder = try MLModel(contentsOf: melEncoderURL)
Important Notes
Fixed Input Shapes: Models use fixed shapes for stability. Audio must be padded/trimmed to 15 seconds (240,000 samples at 16kHz).
Encoder Output Format: Japanese model outputs
(B, features, time)=(1, 1024, T), different from English v3 models which output(B, T, features).Greedy Decoding: Use standard TDT greedy decoding with the decoder and joint networks.
Technical Specifications
{
"model_id": "nvidia/parakeet-tdt_ctc-0.6b-ja",
"vocab_size": 3072,
"hidden_size": 640,
"encoder_features": 1024,
"num_decoder_layers": 2,
"sample_rate": 16000,
"fixed_audio_window_sec": 15.0,
"fixed_mel_frames": 1501,
"fixed_encoder_frames": 188
}
Conversion Details
- Conversion Tool: coremltools 9.0b1
- Source Framework: PyTorch 2.7.0 / NeMo 2.x
- Conversion Date: 2025-12-01
- Conversion Method: Fixed-shape tracing (mobius approach)
License
This model conversion follows the license of the original model: CC-BY-4.0
Citation
If you use this model, please cite the original NVIDIA Parakeet model:
@misc{nvidia_parakeet_tdt_ja,
title={Parakeet TDT CTC 0.6B Japanese},
author={NVIDIA},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/nvidia/parakeet-tdt_ctc-0.6b-ja}
}
Acknowledgments
- Original model by NVIDIA NeMo Team
- Conversion approach inspired by FluidInference/mobius
- Downloads last month
- 16
Model tree for wangjazz/parakeet-tdt-ja-coreml
Base model
nvidia/parakeet-tdt_ctc-0.6b-ja