Overview

GPT-Sw3 6.7B is a 7.1 billion parameter decoder-only transformer language model developed by AI Sweden, in collaboration with RISE and the WASP WARA for Media and Language. It is part of the GPT-Sw3 collection, which focuses on Nordic languages. The model was pretrained using a causal language modeling objective with the NeMo Megatron GPT implementation.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Code Generation: Supports text generation in four programming languages.
Task Adaptability: Can perform various text tasks by framing them as text generation problems, even if not explicitly trained for them.
Extensive Training Data: Trained on a diverse dataset of 320 billion tokens, including a significant portion of Nordic language content and programming code.

Intended Use Cases

This model is primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages. It serves as a foundational model for the Nordic NLP ecosystem, allowing organizations and individuals to validate and test its capabilities and provide feedback.

Limitations

Like other large language models, GPT-Sw3 6.7B has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these potential issues and implement appropriate safeguards.

Performance Benchmarks

Evaluations on the Open LLM Leaderboard show the following average scores:

Avg.: 33.18
ARC (25-shot): 36.35
HellaSwag (10-shot): 60.75
MMLU (5-shot): 26.0
TruthfulQA (0-shot): 39.04
Winogrande (5-shot): 60.69
GSM8K (5-shot): 0.53
DROP (3-shot): 8.92