Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published Oct 3 • 24
MultiCaRe Collection MultiCaRe: Open-Source Clinical Case Dataset • 4 items • Updated Sep 25 • 16
FastVLM Collection Efficient Vision Encoding for Vision Language Models • 9 items • Updated Sep 2 • 104
SARD: Synthetic Arabic Recognition Dataset Collection A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts • 2 items • Updated May 19 • 5
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Paper • 2501.18427 • Published Jan 30 • 23
Reasoning Datasets Collection Reasoning datasets that are trending 🔥 • 10 items • Updated Jan 3 • 25
FalconMamba 7B Collection This collection features the FalconMamba 7B base model, the instruction-tuned version, their 4-bit and GGUF variants, and the demo. • 15 items • Updated about 1 month ago • 34
view article Article Welcome Falcon Mamba: The first strong attention-free 7B model +4 Aug 12, 2024 • 113
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? Paper • 2407.16607 • Published Jul 23, 2024 • 23
Jamba: A Hybrid Transformer-Mamba Language Model Paper • 2403.19887 • Published Mar 28, 2024 • 111
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 189
💫 StarCoder2 Collection StarCoder2 models and datasets! • 8 items • Updated Mar 1, 2024 • 89
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 626
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper • 2402.14905 • Published Feb 22, 2024 • 134
Mixtures of Experts Unlock Parameter Scaling for Deep RL Paper • 2402.08609 • Published Feb 13, 2024 • 36