swiss-ai/Apertus-8B-Instruct-2509
Apertus-8B-Instruct-2509 by swiss-ai is an 8 billion parameter, decoder-only transformer model designed for massive multilingual support, natively handling 1811 languages. Pretrained on 15T tokens with a staged curriculum of web, code, and math data, it features a 32768-token context length and uses a new xIELU activation function and AdEMAMix optimizer. This model emphasizes full transparency with open weights, data, and training details, and is compliant with opt-out consent for data owners, making it suitable for globally-focused, privacy-conscious applications requiring broad language coverage and tool use.
Loading preview...
Apertus-8B-Instruct-2509: A Massively Multilingual and Open LLM
Apertus-8B-Instruct-2509, developed by swiss-ai, is an 8 billion parameter, decoder-only transformer model engineered for pushing the boundaries of fully-open, multilingual, and transparent language models. It supports an impressive 1811 natively supported languages and offers a long context window of 32768 tokens, with default support up to 65,536 tokens. The model is pretrained on 15 trillion tokens using a staged curriculum of web, code, and math data, and incorporates a novel xIELU activation function and AdEMAMix optimizer.
Key Capabilities
- Massively Multilingual: Natively supports 1811 languages, making it highly versatile for global applications.
- Fully Open and Compliant: Features open weights, open training data, and complete training details, including data reconstruction scripts and intermediate checkpoints. It respects opt-out consent of data owners and avoids memorization of training data.
- Long Context Processing: Capable of handling context lengths up to 65,536 tokens.
- Tool Use Support: Designed to support agentic usage with tool integration.
- Strong General Language Understanding: Achieves competitive performance on general language understanding tasks, scoring 65.8% on average across ARC, HellaSwag, WinoGrande, XNLI, XCOPA, and PIQA benchmarks, comparable to other open-weight models in its class.
Good for
- Applications requiring extensive multilingual support across a vast number of languages.
- Use cases where transparency, open data, and compliance with data privacy (e.g., opt-out consent) are critical.
- Tasks benefiting from long context understanding and generation.
- Developing agentic systems that leverage tool use.
- Researchers and developers seeking a fully auditable and reproducible LLM with detailed training insights.
For more in-depth information, refer to the Apertus Technical Report.