Overview

This model, gpt-sw3-20b-instruct, is a 20.9 billion parameter instruction-tuned decoder-only transformer developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-SW3 collection, which focuses on Nordic languages. The model was pretrained on a substantial dataset of 320 billion tokens, encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code, utilizing the NeMo Megatron GPT implementation. The instruct variant has been fine-tuned on instruction data, including both chat and raw text formats, to enhance its ability to follow commands and perform diverse text generation tasks.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Code Generation: Supports text generation in four programming languages.
Instruction Following: Can perform various text tasks by interpreting instructions, even for tasks it wasn't explicitly trained for.
Research and Evaluation: Primarily intended for research and evaluation of large language models, particularly for Nordic languages.

Limitations

Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware of these potential issues and refer to the modified RAIL license for detailed usage guidelines.

Benchmarks

Evaluated on the Open LLM Leaderboard, the model achieved an average score of 38.19, with specific scores including 43.17 on ARC (25-shot) and 71.09 on HellaSwag (10-shot). MMLU (5-shot) scored 31.32, and TruthfulQA (0-shot) scored 41.02.

Overview

Overview

Key Capabilities

Limitations

Benchmarks

Full Model Card (README)