swiss-ai/Apertus-70B-Instruct-2509
Apertus-70B-Instruct-2509 is a 70 billion parameter decoder-only transformer model developed by swiss-ai, pretrained on 15 trillion tokens with a staged curriculum of web, code, and math data. It is designed for fully-open, massively multilingual applications, natively supporting over 1800 languages with a 65,536 token context length. The model emphasizes compliant training data, respecting opt-out consent, and achieves performance comparable to closed-source models.
Loading preview...
Apertus-70B-Instruct-2509: A Massively Multilingual and Open LLM
Apertus-70B-Instruct-2509 is a 70 billion parameter language model from swiss-ai, engineered to advance fully-open, multilingual, and transparent AI. Pretrained on an extensive 15 trillion tokens, it utilizes a staged curriculum including web, code, and math data. A key differentiator is its commitment to open weights, open data, and full training details, including data and recipes, alongside its use of only fully compliant training data that respects opt-out consent.
Key Capabilities
- Massively Multilingual: Natively supports an impressive 1811 languages, making it suitable for global applications.
- Long Context: Features a default context length of up to 65,536 tokens, enabling processing of extensive inputs.
- Open and Compliant: Trained with a focus on data privacy and transparency, avoiding memorization and respecting data owner consent.
- Advanced Architecture: Employs a decoder-only transformer architecture, a new xIELU activation function, and the AdEMAMix optimizer, with alignment via QRPO.
- Tool Use: Supports agentic usage with tool integration capabilities.
When to Use This Model
- Multilingual Applications: Ideal for tasks requiring understanding and generation across a vast array of languages.
- Research and Transparency: Suitable for researchers and developers who require fully open models with detailed training insights and data compliance.
- Long-form Content Processing: Benefits use cases needing to process or generate very long texts due to its extended context window.
- Ethical AI Development: A strong choice for projects prioritizing data privacy, consent, and transparent model development.