Tokenizer set to Mistral?
vllm serve
/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit
--served-model-name MiniMax-M2.1-AWQ
--max-num-seqs 10
--max-model-len 128000
--gpu-memory-utilization 0.95
--tensor-parallel-size 2
--pipeline-parallel-size 1
--enable-auto-tool-choice
--tool-call-parser minimax_m2
--reasoning-parser minimax_m2_append_think
--trust-remote-code
--host 0.0.0.0
--port 8000
(APIServer pid=6372) The tokenizer you are loading from '/models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the fix_mistral_regex=True flag when loading this tokenizer to fix this issue.
Thank you for your interest in the model. This warning usually occurs when loads models in file directory format e.g., /models/awq/cyankiwi-MiniMax-M2.1-AWQ-4bit and not through the repo id e.g., cyankiwi/MiniMax-M2.1-AWQ-4bit.
Does the model work in your environment?
Yes, the model works, thank you. Also what do you think about this Reddit comment? https://www.reddit.com/r/LocalLLaMA/s/w7OWQX7P7b
Thank you for raising this comment to me. I am well awared of recent llm-compressor bugs, which I use llm-compressor version of more one month ago to quantize this model.
In addition, the model definition was modified during runtime to have all tokens routed to all experts for calibration :)