timpal0l/gpt-sw3-356m-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:2kPublished:Apr 18, 2026License:otherArchitecture:Transformer Cold

The timpal0l/gpt-sw3-356m-instruct is a 356 million parameter instruction-tuned decoder-only transformer language model developed by AI Sweden. It is part of the GPT-SW3 collection, trained on a 320 billion token dataset encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text and performing various text tasks across these five languages, making it suitable for multilingual applications, particularly within the Nordic region.

Loading preview...

Overview

The timpal0l/gpt-sw3-356m-instruct is a 356 million parameter instruction-tuned model from the GPT-SW3 series, developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is a decoder-only transformer model, pretrained on a substantial 320 billion token dataset. This dataset includes content in Swedish, Norwegian, Danish, Icelandic, English, and programming code, utilizing the NeMo Megatron GPT implementation for causal language modeling.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in five languages: Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks even if not explicitly trained for them, by framing them as text generation problems.
  • Nordic Language Focus: Specifically designed with a strong emphasis on Nordic languages, making it a valuable resource for NLP in this region.

Intended Use Cases

This model is primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages. It serves as a foundation for generating text and handling diverse text-based tasks in a multilingual context. Users in the Nordic NLP ecosystem are encouraged to validate and test the model, providing feedback to the community. It is suitable for applications requiring text generation and instruction-based task completion across its supported languages.