swiss-ai/Apertus-8B-Instruct-2509

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 13, 2025License:apache-2.0Architecture:Transformer0.5K Open Weights Warm

Apertus-8B-Instruct-2509 by swiss-ai is an 8 billion parameter, decoder-only transformer model designed for massive multilingual support, natively handling 1811 languages. Pretrained on 15T tokens with a staged curriculum of web, code, and math data, it features a 32768-token context length and uses a new xIELU activation function and AdEMAMix optimizer. This model emphasizes full transparency with open weights, data, and training details, and is compliant with opt-out consent for data owners, making it suitable for globally-focused, privacy-conscious applications requiring broad language coverage and tool use.

Loading preview...

Apertus-8B-Instruct-2509: A Massively Multilingual and Open LLM

Apertus-8B-Instruct-2509, developed by swiss-ai, is an 8 billion parameter, decoder-only transformer model engineered for pushing the boundaries of fully-open, multilingual, and transparent language models. It supports an impressive 1811 natively supported languages and offers a long context window of 32768 tokens, with default support up to 65,536 tokens. The model is pretrained on 15 trillion tokens using a staged curriculum of web, code, and math data, and incorporates a novel xIELU activation function and AdEMAMix optimizer.

Key Capabilities

  • Massively Multilingual: Natively supports 1811 languages, making it highly versatile for global applications.
  • Fully Open and Compliant: Features open weights, open training data, and complete training details, including data reconstruction scripts and intermediate checkpoints. It respects opt-out consent of data owners and avoids memorization of training data.
  • Long Context Processing: Capable of handling context lengths up to 65,536 tokens.
  • Tool Use Support: Designed to support agentic usage with tool integration.
  • Strong General Language Understanding: Achieves competitive performance on general language understanding tasks, scoring 65.8% on average across ARC, HellaSwag, WinoGrande, XNLI, XCOPA, and PIQA benchmarks, comparable to other open-weight models in its class.

Good for

  • Applications requiring extensive multilingual support across a vast number of languages.
  • Use cases where transparency, open data, and compliance with data privacy (e.g., opt-out consent) are critical.
  • Tasks benefiting from long context understanding and generation.
  • Developing agentic systems that leverage tool use.
  • Researchers and developers seeking a fully auditable and reproducible LLM with detailed training insights.

For more in-depth information, refer to the Apertus Technical Report.