junkim100's picture
Upload folder using huggingface_hub
27b0c6a verified
# DeepSeek-3B-MoE-Decoder
This is the decoder component of DeepSeek-OCR, a 3B parameter Mixture-of-Experts (MoE) language model.
## Architecture
- **Model**: DeepSeek 3B MoE
- **Active Parameters**: ~570M per token
- **Total Parameters**: ~3B
- **Architecture**: Mixture-of-Experts with routing
## Usage
This decoder should be used with vision embeddings from the encoder component.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load decoder
model = AutoModelForCausalLM.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
tokenizer = AutoTokenizer.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
# Use with vision embeddings from encoder
# vision_embeddings = ... (from DeepEncoder)
# outputs = model(inputs_embeds=vision_embeddings, ...)
```
## Source
Extracted from [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)