Overview

GPT-Sw3 6.7B v2 Instruct is a 7.1 billion parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-Sw3 collection, specifically an instruction-tuned version of the 6.7B v2 base model. The model was pretrained on a substantial 320 billion token dataset, which includes a diverse mix of Swedish, Norwegian, Danish, Icelandic, English, and programming code, utilizing the NeMo Megatron GPT implementation.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in five different languages (Swedish, Norwegian, Danish, Icelandic, English).
Code Generation: Supports text generation in four programming languages.
Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks when prompted, even those not explicitly trained for.
Research and Evaluation: Primarily intended for research and evaluation of LLM capabilities, particularly for Nordic languages.

Performance

Evaluated on the Open LLM Leaderboard, the model achieved an average score of 39.57. Notable benchmark results include:

ARC (25-shot): 40.78
HellaSwag (10-shot): 67.77
MMLU (5-shot): 31.57
TruthfulQA (0-shot): 40.32
Winogrande (5-shot): 63.54

Limitations

Like other large language models, GPT-Sw3 has limitations regarding bias, safety, and potential for hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users are advised to be aware of these limitations and the model's modified RAIL license.

Overview

Overview

Key Capabilities

Performance

Limitations

Full Model Card (README)