| # DeepSeek-3B-MoE-Decoder | |
| This is the decoder component of DeepSeek-OCR, a 3B parameter Mixture-of-Experts (MoE) language model. | |
| ## Architecture | |
| - **Model**: DeepSeek 3B MoE | |
| - **Active Parameters**: ~570M per token | |
| - **Total Parameters**: ~3B | |
| - **Architecture**: Mixture-of-Experts with routing | |
| ## Usage | |
| This decoder should be used with vision embeddings from the encoder component. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # Load decoder | |
| model = AutoModelForCausalLM.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder") | |
| tokenizer = AutoTokenizer.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder") | |
| # Use with vision embeddings from encoder | |
| # vision_embeddings = ... (from DeepEncoder) | |
| # outputs = model(inputs_embeds=vision_embeddings, ...) | |
| ``` | |
| ## Source | |
| Extracted from [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR) | |