swiss-ai/Apertus-8B-2509
Apertus-8B-2509 by swiss-ai is an 8 billion parameter decoder-only transformer model, pretrained on 15T tokens with a staged curriculum of web, code, and math data. It is designed to be a fully-open, massively multilingual model, supporting over 1800 languages with a 32768 token context length. The model focuses on compliant training, respecting opt-out consent and avoiding memorization, while achieving performance comparable to closed-source models.
Loading preview...
Apertus-8B-2509: A Fully Open, Multilingual LLM
Apertus-8B-2509 is an 8 billion parameter language model developed by swiss-ai, part of the larger Apertus family which also includes a 70B variant. This model is distinguished by its commitment to being fully open, encompassing open weights, open training data, and complete training details including recipes. It is a decoder-only transformer, pretrained on an extensive 15 trillion tokens using a staged curriculum that includes web, code, and math data.
Key Capabilities
- Massively Multilingual: Natively supports an impressive 1811 languages, making it suitable for global applications.
- Long Context: Supports a context length of up to 32,768 tokens, with the capability to process up to 65,536 tokens.
- Compliant Training: Trained with strict adherence to data privacy, respecting opt-out consent of data owners and actively avoiding memorization of training data.
- Agentic Usage: Supports tool use, enabling more complex and interactive applications.
- Performance: Achieves competitive performance on general language understanding tasks, with Apertus-8B scoring 65.8% average across ARC, HellaSwag, WinoGrande, XNLI, XCOPA, and PIQA benchmarks.
Good For
- Applications requiring broad multilingual support and understanding.
- Use cases demanding long context processing for complex documents or conversations.
- Developers prioritizing transparent and ethically sourced AI models.
- Integration into systems that benefit from tool-use capabilities for agentic workflows.