AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:7.1BQuant:FP8Ctx Length:2kPublished:Apr 28, 2023License:otherArchitecture:Transformer0.0K Gated Cold

The AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct is a 6.7 billion parameter instruction-tuned decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA. It was pretrained on a 320 billion token dataset encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text and performing various text tasks across five languages and four programming languages, making it suitable for multilingual and code-aware applications.

Loading preview...

Model Overview

AI Sweden's GPT-Sw3 6.7B v2 Instruct is a 6.7 billion parameter decoder-only transformer model, part of the GPT-Sw3 collection. Developed in collaboration with RISE and WASP WARA, this model is specifically instruction-tuned to follow user commands and generate relevant text.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks by casting them as generation tasks.
  • Autoregressive: Generates text sequentially, predicting the next token based on previous ones.

Training Details

The model was pretrained on a substantial dataset of 320 billion tokens, covering a diverse range of languages and programming code. The pretraining utilized a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.

Performance Insights

Evaluations on the Open LLM Leaderboard show an average score of 39.57. Notable scores include 67.77 on HellaSwag (10-shot) and 63.54 on Winogrande (5-shot), indicating its proficiency in common sense reasoning. MMLU (5-shot) scored 31.57, and GSM8K (5-shot) scored 6.37, suggesting areas for further development in complex reasoning and mathematical tasks.

Good For

  • Applications requiring text generation in Nordic languages and English.
  • Instruction-based text tasks and conversational AI.
  • Use cases involving code generation or understanding.

Limitations

Like other large language models, GPT-Sw3 has limitations including potential biases, safety concerns, and issues with generation diversity and hallucination. Users should be aware that the model may produce incorrect, irrelevant, or inappropriate content, and may overrepresent certain viewpoints.