Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Abstract
Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose Dynamic Large Concept Models (DLCM), a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first compression-aware scaling law, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a decoupled μP parametrization that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting (R=4, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a +2.69\% average improvement across 12 zero-shot benchmarks under matched inference FLOPs.
Community
Dynamic Large Concept Models (DLCM) introduce an end-to-end trained concept-level language modeling architecture that breaks the token-uniform computation paradigm in modern LLMs. Inspired by hierarchical models such as H-Net, DLCM learns semantic boundaries directly from latent representations, dynamically compresses token sequences into variable-length concepts, performs deep reasoning in the concept space, and projects the results back to tokens via causal cross-attention.
Compared to standard dense Transformers trained with next-token prediction, DLCM achieves ~34% inference FLOPs reduction under apple-to-apple settings, while consistently improving performance on reasoning-dominant benchmarks. Notably, the relative FLOPs savings increase with model scale, indicating favorable scaling behavior beyond parameter efficiency alone. At similar loss levels, DLCM reallocates computation toward boundary and planning tokens, yielding stronger downstream accuracy despite reduced redundant token processing.
Technically, the paper contributes:
(1) a FlashAttention-VarLen–based implementation for efficient concept-token cross-attention;
(2) a decoupled μP formulation tailored to heterogeneous token- and concept-width modules, enabling zero-shot hyperparameter transfer across scales;
(3) a Global Parser that enforces stable, content-adaptive compression at the batch level and delivers solid empirical gains.
Overall, DLCM can be viewed as a principled special case of layer-wise local compression combined with sparse attention, offering a scalable path toward more compute-efficient and reasoning-centric language models.
arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/dynamic-large-concept-models-latent-reasoning-in-an-adaptive-semantic-space-887-4dc7c253
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- A Unified Sparse Attention via Multi-Granularity Compression (2025)
- Training-free Context-adaptive Attention for Efficient Long Context Modeling (2025)
- Delta-LLaVA: Base-then-Specialize Alignment for Token-Efficient Vision-Language Models (2025)
- AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees (2025)
- BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models (2025)
- Parallel Vision Token Scheduling for Fast and Accurate Multimodal LMMs Inference (2025)
- JEPA-Reasoner: Decoupling Latent Reasoning from Token Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/dynamic-large-concept-models-latent-reasoning-in-an-adaptive-semantic-space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper