GPT-Sw3 126M Instruct: Multilingual Instruction-Tuned LLM

This model is part of the GPT-Sw3 collection, a series of decoder-only transformer language models developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. The GPT-Sw3 126M Instruct is an instruction-tuned variant, specifically fine-tuned on instruction data in both chat and raw text formats.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in five languages: Swedish, Norwegian, Danish, Icelandic, and English.
Multilingual Code Generation: Supports text generation in four programming languages.
Instruction Following: Designed to perform various text tasks by interpreting instructions, even for tasks it wasn't explicitly trained for.
Nordic Language Focus: Pretrained on a substantial dataset of 320 billion tokens, including significant portions of Swedish, Norwegian, Danish, and Icelandic data, making it particularly strong for Nordic language applications.

Intended Use Cases

Research and Evaluation: Primarily released for research and evaluation within the Nordic NLP ecosystem to gather feedback and validate model capabilities.
Text Generation Tasks: Suitable for a wide range of text generation tasks where instruction following is beneficial.
Multilingual Applications: Ideal for applications requiring text generation across the specified Nordic languages and English.

Limitations

Like other large language models, GPT-Sw3 126M Instruct has limitations regarding bias, safety, generation diversity, and hallucination. Users should be aware of potential issues such as overrepresentation of certain viewpoints, stereotypes, and the generation of inappropriate or incorrect content.

Overview

GPT-Sw3 126M Instruct: Multilingual Instruction-Tuned LLM

Key Capabilities

Intended Use Cases

Limitations

Full Model Card (README)