GPT-Sw3 6.7B is a 7.1 billion parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It was pretrained on 320 billion tokens across Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text in multiple Nordic languages and English, as well as handling various text generation tasks.
Overview
GPT-Sw3 6.7B is a 7.1 billion parameter decoder-only transformer language model developed by AI Sweden, in collaboration with RISE and the WASP WARA for Media and Language. It is part of the GPT-Sw3 collection, which focuses on Nordic languages. The model was pretrained using a causal language modeling objective with the NeMo Megatron GPT implementation.
Key Capabilities
- Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
- Code Generation: Supports text generation in four programming languages.
- Task Adaptability: Can perform various text tasks by framing them as text generation problems, even if not explicitly trained for them.
- Extensive Training Data: Trained on a diverse dataset of 320 billion tokens, including a significant portion of Nordic language content and programming code.
Intended Use Cases
This model is primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages. It serves as a foundational model for the Nordic NLP ecosystem, allowing organizations and individuals to validate and test its capabilities and provide feedback.
Limitations
Like other large language models, GPT-Sw3 6.7B has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these potential issues and implement appropriate safeguards.
Performance Benchmarks
Evaluations on the Open LLM Leaderboard show the following average scores:
- Avg.: 33.18
- ARC (25-shot): 36.35
- HellaSwag (10-shot): 60.75
- MMLU (5-shot): 26.0
- TruthfulQA (0-shot): 39.04
- Winogrande (5-shot): 60.69
- GSM8K (5-shot): 0.53
- DROP (3-shot): 8.92