YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DeepSeek-3B-MoE-Decoder

This is the decoder component of DeepSeek-OCR, a 3B parameter Mixture-of-Experts (MoE) language model.

Architecture

  • Model: DeepSeek 3B MoE
  • Active Parameters: ~570M per token
  • Total Parameters: ~3B
  • Architecture: Mixture-of-Experts with routing

Usage

This decoder should be used with vision embeddings from the encoder component.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load decoder
model = AutoModelForCausalLM.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
tokenizer = AutoTokenizer.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")

# Use with vision embeddings from encoder
# vision_embeddings = ... (from DeepEncoder)
# outputs = model(inputs_embeds=vision_embeddings, ...)

Source

Extracted from deepseek-ai/DeepSeek-OCR

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support