Parakeet TDT Japanese CoreML Models

CoreML conversion of nvidia/parakeet-tdt_ctc-0.6b-ja for on-device Japanese speech recognition on Apple platforms (iOS/macOS).

Model Description

This is a CoreML conversion of NVIDIA's Parakeet TDT 0.6B Japanese model, a state-of-the-art automatic speech recognition (ASR) model based on the FastConformer-TDT (Token-and-Duration Transducer) architecture.

Key Features

  • Language: Japanese (ja)
  • Architecture: Hybrid FastConformer-TDT-CTC
  • Vocabulary Size: 3,072 tokens (SentencePiece BPE)
  • Sample Rate: 16 kHz
  • Fixed Audio Window: 15 seconds (240,000 samples)

Model Components

Component File Input Shape Output Shape
Preprocessor preprocessor.mlpackage [1, 240000] [1, 80, T]
Encoder encoder.mlpackage [1, 80, 1501] [1, 1024, 188]
Decoder decoder.mlpackage [1, 1] + LSTM states [1, 640, 2] + states
Joint joint.mlpackage encoder + decoder outputs [1, 188, 1, 3078]
Mel+Encoder (Fused) mel_encoder.mlpackage [1, 240000] [1, 1024, 188]
Vocabulary vocab_ja.json - 3,072 tokens

Usage

Swift Example

import CoreML

// Load models
let encoder = try MLModel(contentsOf: encoderURL)
let decoder = try MLModel(contentsOf: decoderURL)
let joint = try MLModel(contentsOf: jointURL)

// Or use the fused mel_encoder for simpler pipeline
let melEncoder = try MLModel(contentsOf: melEncoderURL)

Important Notes

  1. Fixed Input Shapes: Models use fixed shapes for stability. Audio must be padded/trimmed to 15 seconds (240,000 samples at 16kHz).

  2. Encoder Output Format: Japanese model outputs (B, features, time) = (1, 1024, T), different from English v3 models which output (B, T, features).

  3. Greedy Decoding: Use standard TDT greedy decoding with the decoder and joint networks.

Technical Specifications

{
  "model_id": "nvidia/parakeet-tdt_ctc-0.6b-ja",
  "vocab_size": 3072,
  "hidden_size": 640,
  "encoder_features": 1024,
  "num_decoder_layers": 2,
  "sample_rate": 16000,
  "fixed_audio_window_sec": 15.0,
  "fixed_mel_frames": 1501,
  "fixed_encoder_frames": 188
}

Conversion Details

  • Conversion Tool: coremltools 9.0b1
  • Source Framework: PyTorch 2.7.0 / NeMo 2.x
  • Conversion Date: 2025-12-01
  • Conversion Method: Fixed-shape tracing (mobius approach)

License

This model conversion follows the license of the original model: CC-BY-4.0

Citation

If you use this model, please cite the original NVIDIA Parakeet model:

@misc{nvidia_parakeet_tdt_ja,
  title={Parakeet TDT CTC 0.6B Japanese},
  author={NVIDIA},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/nvidia/parakeet-tdt_ctc-0.6b-ja}
}

Acknowledgments

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for wangjazz/parakeet-tdt-ja-coreml

Quantized
(1)
this model