Model Overview

AI Sweden's GPT-Sw3 6.7B v2 Instruct is a 6.7 billion parameter decoder-only transformer model, part of the GPT-Sw3 collection. Developed in collaboration with RISE and WASP WARA, this model is specifically instruction-tuned to follow user commands and generate relevant text.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Code Generation: Supports text generation in four programming languages.
Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks by casting them as generation tasks.
Autoregressive: Generates text sequentially, predicting the next token based on previous ones.

Training Details

The model was pretrained on a substantial dataset of 320 billion tokens, covering a diverse range of languages and programming code. The pretraining utilized a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.

Performance Insights

Evaluations on the Open LLM Leaderboard show an average score of 39.57. Notable scores include 67.77 on HellaSwag (10-shot) and 63.54 on Winogrande (5-shot), indicating its proficiency in common sense reasoning. MMLU (5-shot) scored 31.57, and GSM8K (5-shot) scored 6.37, suggesting areas for further development in complex reasoning and mathematical tasks.

Good For

Applications requiring text generation in Nordic languages and English.
Instruction-based text tasks and conversational AI.
Use cases involving code generation or understanding.

Limitations

Like other large language models, GPT-Sw3 has limitations including potential biases, safety concerns, and issues with generation diversity and hallucination. Users should be aware that the model may produce incorrect, irrelevant, or inappropriate content, and may overrepresent certain viewpoints.

Overview