4 56 90

BuiDoan

AI & ML interests

None yet

Recent Activity

reacted to Kseniase's post with 👀 1 day ago

15 Outstanding Research Papers from NeurIPS 2025 NeurIPS 2025, as a premier annual event in machine learning and computational neuroscience, tackles major topics like the future of AI, current research, and the most difficult challenges. While we’re not attending this year, we’re closely following the updates and today we pull together a quick, easy-to-digest roundup of a few standout papers so you can jump in without getting overwhelmed. Here is a list of 15 papers from NeurIPS 2025, including 8 top research papers that received awards, along with 7 others that caught our attention: 1. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks → https://neurips.cc/virtual/2025/loc/san-diego/test-of-time/128328 Test of Time Award winner. Introduces the RPN, a small convnet that predicts objectness and boxes on shared features, enabling Faster R-CNN to share computation and run around 5 fps on a GPU 2. Artificial Hivemind: The Open-Ended Homogeneity of LMs (and Beyond) → https://neurips.cc/virtual/2025/loc/san-diego/poster/121421 Releases a huge open-ended prompt dataset, showing that LLMs often fall into an “artificial hivemind” – generate surprisingly similar answers – and measuring diversity collapse 3. Optimal Mistake Bounds for Transductive Online Learning → https://neurips.cc/virtual/2025/loc/san-diego/poster/119098 Settles a 30-year-old question by showing how much unlabeled data helps in online learning – it gives a precise quadratic advantage with tight matching bounds 4. Gated Attention for LLMs: Non-linearity, Sparsity, and Attention-Sink-Free → https://neurips.cc/virtual/2025/loc/san-diego/poster/120216 Demonstrates how gating actually affects attention: a simple sigmoid gate after Scaled Dot-Product Attention (SDPA) boosts performance, stability, and long-context behavior by adding useful nonlinearity and sparse modulation Read further below ⬇️ Also, subscribe to the Turing Post: https://www.turingpost.com/subscribe

reacted to prithivMLmods's post with ❤️ 1 day ago

One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. 🗣️🔥 🤗 Vision-to-VibeVoice-en [Demo]: https://huggingface.co/spaces/prithivMLmods/Vision-to-VibeVoice-en ✨ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations ✨ Speech [VibeVoice-Realtime-0.5B]: https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B ✨ Vision [Qwen2.5-VL]: https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct To know more about it, visit the app page or the respective model page!

liked a model 6 days ago

apple/CLaRa-7B-Instruct

View all activity

Organizations

upvoted a collection 6 days ago

📙 LLM Engineer's Handbook

Collection

Models and datasets from my book. All the code is freely available at https://github.com/PacktPublishing/LLM-Engineers-Handbook • 6 items • Updated Apr 7 • 13

upvoted a collection 5 months ago

Kimi-K2

Collection

Moonshot's MoE LLMs with 1 trillion parameters, exceptional on agentic intellegence • 5 items • Updated 25 days ago • 156

upvoted an article 7 months ago

Article

The 4 Things Qwen-3’s Chat Template Teaches Us

Apr 30

•

upvoted 3 papers 7 months ago

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12 • 82

Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 186

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 188

upvoted an article 7 months ago

Article

Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)

Jun 16, 2023

•

upvoted 2 papers 7 months ago

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Paper • 2505.00551 • Published May 1 • 36

ReasonIR: Training Retrievers for Reasoning Tasks

Paper • 2504.20595 • Published Apr 29 • 53

upvoted an article 7 months ago

Article

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

Apr 27

•

upvoted a paper 7 months ago

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Paper • 2505.02835 • Published May 5 • 29

upvoted 2 articles 7 months ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Mar 17

•

344

Article

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

Jan 31

•

upvoted 3 papers 7 months ago

Phi-4-reasoning Technical Report

Paper • 2504.21318 • Published Apr 30 • 53

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

Paper • 2504.18415 • Published Apr 25 • 47

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 49

upvoted 2 articles 7 months ago

Article

Open R1: Update #3

Mar 11

•

296

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

•

887

upvoted 2 papers 7 months ago

Trillion 7B Technical Report

Paper • 2504.15431 • Published Apr 21 • 37

Tina: Tiny Reasoning Models via LoRA

Paper • 2504.15777 • Published Apr 22 • 56

BuiDoan

AI & ML interests

Recent Activity

Organizations

BuiDoan's activity

The 4 Things Qwen-3’s Chat Template Teaches Us

Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

Open R1: Update #3

Open-R1: a fully open reproduction of DeepSeek-R1