PPL is not good

#1
by snomile - opened

baseline is the original Qwen3-30B-A3B-Instruct-2507

Dataset | Baseline | Qwen3-30B-A3B-Instruct-2507-W4A16-GPTQ

English (General) | 12.6666 | 13.95(+10.1%) ❌
Chinese (General) | 8.7130 | 9.36(+7.4%)
Code (Python Logic) | 2.4904 | 2.85(+14.5%) ❌
Math (Reasoning) | 2.9013 | 2.88(-0.7%) ✅
German (Translation) | 3.0212 | 3.13(+3.4%)
French (Euro Lang) | 2.6002 | 2.76(+6.2%)
Japanese (Asian Lang) | 4.5437 | 4.77(+4.9%)

Thanks for sharing. Can this be cause by the 16->4bit weights?
Seems even for that it's too high.
My any change have you tested any other w4A16 quant of the same model?
Like: ramblingpolymath/Qwen3-30B-A3B-Instruct-2507-W4A16 ?

Sign up or log in to comment