GPT-Sw3 20B: A Multilingual and Multicode Base Model

GPT-Sw3 20B is a substantial 20 billion parameter decoder-only transformer language model developed by AI Sweden, in collaboration with RISE and the WASP WARA for Media and Language. It is part of a larger family of GPT-Sw3 models, ranging from 126M to 40B parameters, including instruct-tuned and quantized versions.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Multicode Generation: Supports text generation in four different programming languages.
Instruction Following: Can be instructed to perform various text-based tasks, even those not explicitly trained for, by framing them as text generation problems.
Extensive Training Data: Pretrained on a diverse dataset of 320 billion tokens, ensuring broad language and code understanding.

Intended Use Cases

General Text Generation: Suitable for creating diverse textual content across its supported languages.
Code-Related Tasks: Can assist with generating or understanding code snippets in supported programming languages.
Research and Development: Provides a robust base model for further fine-tuning or experimentation in multilingual and multicode NLP applications.

Limitations

Like other large language models, GPT-Sw3 20B has limitations including potential biases, safety concerns, and issues with generation diversity and hallucination. Users should be aware that the model may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Performance metrics from the Open LLM Leaderboard indicate an average score of 35.83, with specific scores like 28.47 on MMLU (5-shot) and 68.75 on HellaSwag (10-shot).

Overview

GPT-Sw3 20B: A Multilingual and Multicode Base Model

Key Capabilities

Intended Use Cases

Limitations

Full Model Card (README)