timpal0l/gpt-sw3-20b-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:20.9BQuant:FP8Ctx Length:2kPublished:Apr 20, 2026License:otherArchitecture:Transformer Cold

The timpal0l/gpt-sw3-20b-instruct is a 20.9 billion parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA. It is an instruction-tuned variant of the GPT-Sw3 base model, trained on 320 billion tokens across Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model specializes in generating coherent text in multiple Nordic languages and English, and can perform various text tasks through instruction-following.

Loading preview...

Overview

This model, gpt-sw3-20b-instruct, is a 20.9 billion parameter instruction-tuned decoder-only transformer developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-SW3 collection, which focuses on Nordic languages. The model was pretrained on a substantial dataset of 320 billion tokens, encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code, utilizing the NeMo Megatron GPT implementation. The instruct variant has been fine-tuned on instruction data, including both chat and raw text formats, to enhance its ability to follow commands and perform diverse text generation tasks.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Can perform various text tasks by interpreting instructions, even for tasks it wasn't explicitly trained for.
  • Research and Evaluation: Primarily intended for research and evaluation of large language models, particularly for Nordic languages.

Limitations

Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware of these potential issues and refer to the modified RAIL license for detailed usage guidelines.

Benchmarks

Evaluated on the Open LLM Leaderboard, the model achieved an average score of 38.19, with specific scores including 43.17 on ARC (25-shot) and 71.09 on HellaSwag (10-shot). MMLU (5-shot) scored 31.32, and TruthfulQA (0-shot) scored 41.02.