DeepSeek-3B-MoE-Decoder
This is the decoder component of DeepSeek-OCR, a 3B parameter Mixture-of-Experts (MoE) language model.
Architecture
- Model: DeepSeek 3B MoE
- Active Parameters: ~570M per token
- Total Parameters: ~3B
- Architecture: Mixture-of-Experts with routing
Usage
This decoder should be used with vision embeddings from the encoder component.
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load decoder
model = AutoModelForCausalLM.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
tokenizer = AutoTokenizer.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
# Use with vision embeddings from encoder
# vision_embeddings = ... (from DeepEncoder)
# outputs = model(inputs_embeds=vision_embeddings, ...)
Source
Extracted from deepseek-ai/DeepSeek-OCR