swiss-ai/Apertus-8B-2509

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 2, 2025License:apache-2.0Architecture:Transformer0.2K Open Weights Cold

Apertus-8B-2509 by swiss-ai is an 8 billion parameter decoder-only transformer model, pretrained on 15T tokens with a staged curriculum of web, code, and math data. It is designed to be a fully-open, massively multilingual model, supporting over 1800 languages with a 32768 token context length. The model focuses on compliant training, respecting opt-out consent and avoiding memorization, while achieving performance comparable to closed-source models.

Loading preview...

Apertus-8B-2509: A Fully Open, Multilingual LLM

Apertus-8B-2509 is an 8 billion parameter language model developed by swiss-ai, part of the larger Apertus family which also includes a 70B variant. This model is distinguished by its commitment to being fully open, encompassing open weights, open training data, and complete training details including recipes. It is a decoder-only transformer, pretrained on an extensive 15 trillion tokens using a staged curriculum that includes web, code, and math data.

Key Capabilities

  • Massively Multilingual: Natively supports an impressive 1811 languages, making it suitable for global applications.
  • Long Context: Supports a context length of up to 32,768 tokens, with the capability to process up to 65,536 tokens.
  • Compliant Training: Trained with strict adherence to data privacy, respecting opt-out consent of data owners and actively avoiding memorization of training data.
  • Agentic Usage: Supports tool use, enabling more complex and interactive applications.
  • Performance: Achieves competitive performance on general language understanding tasks, with Apertus-8B scoring 65.8% average across ARC, HellaSwag, WinoGrande, XNLI, XCOPA, and PIQA benchmarks.

Good For

  • Applications requiring broad multilingual support and understanding.
  • Use cases demanding long context processing for complex documents or conversations.
  • Developers prioritizing transparent and ethically sourced AI models.
  • Integration into systems that benefit from tool-use capabilities for agentic workflows.