WestCode1357/gpt-sw3-6.7b-v2-instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.1BQuant:FP8Ctx Length:2kPublished:May 19, 2026License:otherArchitecture:Transformer Warm

The GPT-Sw3 6.7B v2 Instruct model by AI Sweden is a 7.1 billion parameter decoder-only transformer language model, fine-tuned for instruction following. It was trained on a 320 billion token dataset comprising Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model excels at generating coherent text in multiple languages and performing various text tasks through instruction-based prompting, making it suitable for multilingual applications, especially within the Nordic language ecosystem.

Loading preview...

Overview

GPT-Sw3 6.7B v2 Instruct is a 7.1 billion parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-Sw3 collection, specifically an instruction-tuned version of the 6.7B v2 base model. The model was pretrained on a substantial 320 billion token dataset, which includes a diverse mix of Swedish, Norwegian, Danish, Icelandic, English, and programming code, utilizing the NeMo Megatron GPT implementation.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in five different languages (Swedish, Norwegian, Danish, Icelandic, English).
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks when prompted, even those not explicitly trained for.
  • Research and Evaluation: Primarily intended for research and evaluation of LLM capabilities, particularly for Nordic languages.

Performance

Evaluated on the Open LLM Leaderboard, the model achieved an average score of 39.57. Notable benchmark results include:

  • ARC (25-shot): 40.78
  • HellaSwag (10-shot): 67.77
  • MMLU (5-shot): 31.57
  • TruthfulQA (0-shot): 40.32
  • Winogrande (5-shot): 63.54

Limitations

Like other large language models, GPT-Sw3 has limitations regarding bias, safety, and potential for hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users are advised to be aware of these limitations and the model's modified RAIL license.