Overview

AI Sweden's GPT-Sw3 20B is a 20.9 billion parameter decoder-only transformer language model, part of the GPT-Sw3 series. Developed in collaboration with RISE and the WASP WARA for Media and Language, it was pretrained using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Code Generation: Supports text generation in four programming languages.
Instruction Following: Can perform text tasks it hasn't been explicitly trained for by framing them as text generation tasks.
Extensive Training Data: Trained on a diverse dataset of 320 billion tokens, including books, articles, code, conversational data, and web crawls, with a focus on Nordic languages.

Intended Use

This model is primarily released for research and evaluation of Large Language Models, particularly for the Nordic languages. It aims to foster knowledge building, model validation, and community feedback on LLM performance in this linguistic context.

Limitations

Like other large language models, GPT-Sw3 20B has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these risks, as detailed in its modified RAIL license, and implement appropriate safeguards and disclaimers.