Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel Paper β’ 2508.18224 β’ Published Aug 25 β’ 1
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper β’ 2511.09611 β’ Published 25 days ago β’ 68
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 Mar 26 β’ 175
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper β’ 2411.13476 β’ Published Nov 20, 2024 β’ 16
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper β’ 2411.07133 β’ Published Nov 11, 2024 β’ 38
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free Paper β’ 2410.10814 β’ Published Oct 14, 2024 β’ 51
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. β’ 3 items β’ Updated Jul 10 β’ 22
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy +4 Sep 18, 2024 β’ 272
view article Article dstack: Your LLM Launchpad - From Fine-Tuning to Serving, Simplified Aug 22, 2024 β’ 13
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 β’ 261
Improving Text Embeddings with Large Language Models Paper β’ 2401.00368 β’ Published Dec 31, 2023 β’ 82
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper β’ 2406.12793 β’ Published Jun 18, 2024 β’ 33
Aligning to Thousands of Preferences via System Message Generalization Paper β’ 2405.17977 β’ Published May 28, 2024 β’ 7