Tokenizer set to Mistral?

#1
by zmarty - opened

vllm serve
/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit
--served-model-name MiniMax-M2.1-AWQ
--max-num-seqs 10
--max-model-len 128000
--gpu-memory-utilization 0.95
--tensor-parallel-size 2
--pipeline-parallel-size 1
--enable-auto-tool-choice
--tool-call-parser minimax_m2
--reasoning-parser minimax_m2_append_think
--trust-remote-code
--host 0.0.0.0
--port 8000

(APIServer pid=6372) The tokenizer you are loading from '/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.

cyankiwi org

Thank you for your interest in the model. This warning usually occurs when loads models in file directory format e.g., /models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit and not through the repo id e.g., cyankiwi/MiniMax-M2.1-AWQ-4bit.

Does the model work in your environment?

Yes, the model works, thank you. Also what do you think about this Reddit comment? https://www.reddit.com/r/LocalLLaMA/s/w7OWQX7P7b

cyankiwi org

Thank you for raising this comment to me. I am well awared of recent llm-compressor bugs, which I use llm-compressor version of more one month ago to quantize this model.

In addition, the model definition was modified during runtime to have all tokens routed to all experts for calibration :)

did you have to set the flag fix_mistral_regex=True or does it run without it? @zmarty

Sign up or log in to comment