Tokenizer set to Mistral?

by zmarty - opened 5 days ago

5 days ago

vllm serve
/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit
--served-model-name MiniMax-M2.1-AWQ
--max-num-seqs 10
--max-model-len 128000
--gpu-memory-utilization 0.95
--tensor-parallel-size 2
--pipeline-parallel-size 1
--enable-auto-tool-choice
--tool-call-parser minimax_m2
--reasoning-parser minimax_m2_append_think
--trust-remote-code
--host 0.0.0.0
--port 8000

(APIServer pid=6372) The tokenizer you are loading from '/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.

cpatonn

cyankiwi org 3 days ago

Thank you for your interest in the model. This warning usually occurs when loads models in file directory format e.g., /models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit and not through the repo id e.g., cyankiwi/MiniMax-M2.1-AWQ-4bit.

Does the model work in your environment?

zmarty

3 days ago

Yes, the model works, thank you. Also what do you think about this Reddit comment? https://www.reddit.com/r/LocalLLaMA/s/w7OWQX7P7b

cpatonn

cyankiwi org 3 days ago

Thank you for raising this comment to me. I am well awared of recent llm-compressor bugs, which I use llm-compressor version of more one month ago to quantize this model.

In addition, the model definition was modified during runtime to have all tokens routed to all experts for calibration :)

paolovic

1 day ago

•

edited 1 day ago

did you have to set the flag fix_mistral_regex=True or does it run without it? @zmarty

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment