Apertus-8B-Instruct-2509: A Fully Open Multilingual LLM

Apertus-8B-Instruct-2509 is an 8 billion parameter language model from swiss-ai, part of the Apertus family, which also includes a 70B variant. This model is distinguished by its commitment to full transparency, offering open weights, open training data, and complete training details. It is a decoder-only transformer, pretrained on an extensive 15 trillion tokens using a staged curriculum that incorporates web, code, and math data. The model utilizes a novel xIELU activation function and was trained from scratch with the AdEMAMix optimizer, followed by supervised fine-tuning and alignment via QRPO.

Key Capabilities

Massively Multilingual: Natively supports 1811 languages, making it highly versatile for global applications.
Long Context: Features a substantial context length of up to 65,536 tokens, enabling processing of extensive inputs.
Tool Use: Supports agentic usage with tool integration capabilities.
Data Compliance: Trained with a strong emphasis on respecting opt-out consent for data owners and avoiding memorization of training data.
Openness: Provides full transparency regarding its training process, data, and architecture, including a technical report and open training data reconstruction scripts.

Good For

Applications requiring broad multilingual support.
Tasks benefiting from long context understanding and generation.
Developers prioritizing fully transparent and auditable AI models.
Use cases where data privacy and consent are critical considerations.

Overview

Apertus-8B-Instruct-2509: A Fully Open Multilingual LLM

Key Capabilities

Good For

Full Model Card (README)