view article Article What’s MXFP4? The 4-Bit Secret Powering OpenAI’s GPT‑OSS Models on Modest Hardware Aug 8 • 29
view article Article 🐺🐦⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark Jan 2 • 41
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 10 • 149
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper • 2306.05685 • Published Jun 9, 2023 • 39
Recent highlights Collection Some recent models worth checking out • 18 items • Updated Nov 1, 2024 • 54
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Jul 21 • 211
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16, 2024 • 38