swiss-ai/Apertus-70B-2509
Apertus-70B-2509 by swiss-ai is a 70 billion parameter decoder-only transformer model, pretrained on 15T tokens with a staged curriculum of web, code, and math data. It is designed for massive multilingual support, natively handling over 1800 languages, and features a 32K token context length. This model emphasizes full transparency with open weights, data, and training details, while ensuring compliance by respecting data owners' opt-out consent.
Loading preview...
Apertus-70B-2509: A Transparent, Multilingual LLM
Apertus-70B-2509, developed by swiss-ai, is a 70 billion parameter decoder-only transformer model pretrained on an extensive 15 trillion tokens. It stands out for its commitment to full transparency, offering open weights, data, and complete training details, including data reconstruction scripts and intermediate checkpoints. The model was trained from scratch using a staged curriculum of web, code, and math data, incorporating a novel xIELU activation function and the AdEMAMix optimizer, followed by supervised fine-tuning and alignment via QRPO.
Key Capabilities
- Massively Multilingual: Natively supports over 1800 languages, making it suitable for global applications.
- Long Context: Features a default context length of 32,768 tokens, extendable up to 65,536 tokens.
- Ethically Compliant: Trained exclusively on fully compliant and open data, respecting opt-out consent and avoiding memorization.
- Tool Use: Supports agentic usage with tool integration capabilities.
- Performance: Achieves competitive performance on general language understanding tasks, comparable to models with closed training methodologies.
Good for
- Applications requiring extensive multilingual support across a vast number of languages.
- Use cases demanding transparency in model development, data, and training processes.
- Tasks benefiting from long context understanding and generation.
- Developers seeking a powerful, openly documented model for research and deployment, with support for tool use and various inference frameworks like Transformers, vLLM, and MLX.