junkim100 commited on
Commit
27b0c6a
·
verified ·
1 Parent(s): a2cedf4

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +30 -0
  2. config.json +5 -0
  3. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepSeek-3B-MoE-Decoder
2
+
3
+ This is the decoder component of DeepSeek-OCR, a 3B parameter Mixture-of-Experts (MoE) language model.
4
+
5
+ ## Architecture
6
+
7
+ - **Model**: DeepSeek 3B MoE
8
+ - **Active Parameters**: ~570M per token
9
+ - **Total Parameters**: ~3B
10
+ - **Architecture**: Mixture-of-Experts with routing
11
+
12
+ ## Usage
13
+
14
+ This decoder should be used with vision embeddings from the encoder component.
15
+
16
+ ```python
17
+ from transformers import AutoModelForCausalLM, AutoTokenizer
18
+
19
+ # Load decoder
20
+ model = AutoModelForCausalLM.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
21
+ tokenizer = AutoTokenizer.from_pretrained("junkim100/DeepSeek-3B-MoE-decoder")
22
+
23
+ # Use with vision embeddings from encoder
24
+ # vision_embeddings = ... (from DeepEncoder)
25
+ # outputs = model(inputs_embeds=vision_embeddings, ...)
26
+ ```
27
+
28
+ ## Source
29
+
30
+ Extracted from [deepseek-ai/DeepSeek-OCR](https://huggingface.co/deepseek-ai/DeepSeek-OCR)
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DeepseekOCRForCausalLM"
4
+ ]
5
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f85521f2e6c344c36ffa997e4d55b889dcb59284a22ecf0748cc5e32ac283e8e
3
+ size 5869729208