Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2408.03910
Research
Collection by May 7
1
  • DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

    Paper • 2309.14509 • Published Sep 25, 2023 • 19
  • LLM Augmented LLMs: Expanding Capabilities through Composition

    Paper • 2401.02412 • Published Jan 4, 2024 • 38
  • DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Paper • 2401.06066 • Published Jan 11, 2024 • 58
  • Tuning Language Models by Proxy

    Paper • 2401.08565 • Published Jan 16, 2024 • 22
Research
Collection by May 7
1
  • DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

    Paper • 2309.14509 • Published Sep 25, 2023 • 19
  • LLM Augmented LLMs: Expanding Capabilities through Composition

    Paper • 2401.02412 • Published Jan 4, 2024 • 38
  • DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Paper • 2401.06066 • Published Jan 11, 2024 • 58
  • Tuning Language Models by Proxy

    Paper • 2401.08565 • Published Jan 16, 2024 • 22
  • Previous
  • 1
  • 2
  • Next
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs