Changes from the Original DeepSeek-V3.2-Exp

  • Dequantized Indexer to bfloat16
  • Compatibility with transformers library (trust_remote_code=True)

test code

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("kishizaki-sci/DeepSeek-V3.2-Exp-FP8", trust_remote_code=True, dtype="auto", device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("kishizaki-sci/DeepSeek-V3.2-Exp-FP8")

# Copied from https://huggingface.co/docs/transformers/model_doc/deepseek_v3
chat = [
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

inputs = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
import time
start = time.time()
outputs = model.generate(inputs, max_new_tokens=50)
print(tokenizer.batch_decode(outputs))
print(time.time()-start)
Downloads last month
34
Safetensors
Model size
672B params
Tensor type
F32
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kishizaki-sci/DeepSeek-V3.2-Exp-FP8

Quantized
(3)
this model