unsloth/Apertus-8B-Instruct-2509

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Oct 2, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

Apertus-8B-Instruct-2509 is an 8 billion parameter decoder-only transformer model developed by swiss-ai, designed for fully-open, multilingual, and transparent language processing. Pretrained on 15T tokens with a staged curriculum including web, code, and math data, it natively supports 1811 languages and features a 65,536 token context length. This model emphasizes open weights, open data, and full training details, aiming for performance comparable to closed-source models while respecting data privacy and consent.

Loading preview...

Apertus-8B-Instruct-2509: A Fully Open Multilingual LLM

Apertus-8B-Instruct-2509 is an 8 billion parameter language model from swiss-ai, part of the Apertus family, which also includes a 70B variant. This model is distinguished by its commitment to full transparency, offering open weights, open training data, and complete training details. It is a decoder-only transformer, pretrained on an extensive 15 trillion tokens using a staged curriculum that incorporates web, code, and math data. The model utilizes a novel xIELU activation function and was trained from scratch with the AdEMAMix optimizer, followed by supervised fine-tuning and alignment via QRPO.

Key Capabilities

  • Massively Multilingual: Natively supports 1811 languages, making it highly versatile for global applications.
  • Long Context: Features a substantial context length of up to 65,536 tokens, enabling processing of extensive inputs.
  • Tool Use: Supports agentic usage with tool integration capabilities.
  • Data Compliance: Trained with a strong emphasis on respecting opt-out consent for data owners and avoiding memorization of training data.
  • Openness: Provides full transparency regarding its training process, data, and architecture, including a technical report and open training data reconstruction scripts.

Good For

  • Applications requiring broad multilingual support.
  • Tasks benefiting from long context understanding and generation.
  • Developers prioritizing fully transparent and auditable AI models.
  • Use cases where data privacy and consent are critical considerations.