Model Overview

AI Sweden's GPT-Sw3 6.7B v2 is a 7.1 billion parameter decoder-only transformer language model, part of the GPT-SW3 collection. Developed in collaboration with RISE and WASP WARA for Media and Language, this version is an update to the original 6.7B model. It utilizes the same tokenizer but was trained on a different data distribution with significantly more English and programming code, and for an extended duration.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Code Generation: Supports text generation in four programming languages.
Task Adaptability: Can perform various text-based tasks by framing them as causal language modeling problems, even if not explicitly trained for them.
Autoregressive Model: Functions as an autoregressive language model, predicting the next token in a sequence.

Intended Use Cases

Research and Evaluation: Primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages.
Text Generation: Suitable for applications requiring text generation in its supported languages and programming contexts.
NLP Ecosystem Development: Aims to support organizations and individuals in the Nordic NLP community for validation and testing.

Limitations

Like other large language models, GPT-Sw3 6.7B v2 has limitations including potential biases, safety concerns, quality issues such as generation diversity, and hallucination. It may produce incorrect information, repetitive outputs, or content that is not appropriate for all settings. The model's training data, which includes public Common Crawl, Reddit, Familjeliv, and Flashback, may contain offensive or sensitive content.