swiss-ai/Apertus-70B-Instruct-2509

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Sep 1, 2025License:apache-2.0Architecture:Transformer0.2K Open Weights Warm

Apertus-70B-Instruct-2509 is a 70 billion parameter decoder-only transformer model developed by swiss-ai, pretrained on 15 trillion tokens with a staged curriculum of web, code, and math data. It is designed for fully-open, massively multilingual applications, natively supporting over 1800 languages with a 65,536 token context length. The model emphasizes compliant training data, respecting opt-out consent, and achieves performance comparable to closed-source models.

Loading preview...

Apertus-70B-Instruct-2509: A Massively Multilingual and Open LLM

Apertus-70B-Instruct-2509 is a 70 billion parameter language model from swiss-ai, engineered to advance fully-open, multilingual, and transparent AI. Pretrained on an extensive 15 trillion tokens, it utilizes a staged curriculum including web, code, and math data. A key differentiator is its commitment to open weights, open data, and full training details, including data and recipes, alongside its use of only fully compliant training data that respects opt-out consent.

Key Capabilities

  • Massively Multilingual: Natively supports an impressive 1811 languages, making it suitable for global applications.
  • Long Context: Features a default context length of up to 65,536 tokens, enabling processing of extensive inputs.
  • Open and Compliant: Trained with a focus on data privacy and transparency, avoiding memorization and respecting data owner consent.
  • Advanced Architecture: Employs a decoder-only transformer architecture, a new xIELU activation function, and the AdEMAMix optimizer, with alignment via QRPO.
  • Tool Use: Supports agentic usage with tool integration capabilities.

When to Use This Model

  • Multilingual Applications: Ideal for tasks requiring understanding and generation across a vast array of languages.
  • Research and Transparency: Suitable for researchers and developers who require fully open models with detailed training insights and data compliance.
  • Long-form Content Processing: Benefits use cases needing to process or generate very long texts due to its extended context window.
  • Ethical AI Development: A strong choice for projects prioritizing data privacy, consent, and transparent model development.