Overview

GPT-Sw3 is a family of large decoder-only transformer language models developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. This particular model, gpt-sw3-20b-instruct, is a 20.9 billion parameter variant that has been fine-tuned on instruction data using both chat and raw text formats.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Multilingual Code Generation: Supports text generation in four programming languages.
Instruction Following: Can be instructed to perform various text tasks, even those not explicitly trained for, by framing them as text generation tasks.
Extensive Training Data: Pretrained on a diverse dataset of 320 billion tokens, including a significant portion of Nordic languages and programming code.

Intended Use Cases

Research and Evaluation: Primarily intended for research and evaluation of Large Language Models, especially concerning their capabilities in Nordic languages.
Text Generation: Suitable for generating human-like text across its supported languages.
Instruction-based Tasks: Can be used for tasks requiring the model to follow specific instructions, such as question answering, summarization, or creative writing.

Limitations

Like other large language models, GPT-Sw3 has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware of these potential issues and implement appropriate safeguards.