AI-Sweden-Models/gpt-sw3-20b-instruct
The AI-Sweden-Models/gpt-sw3-20b-instruct is a 20 billion parameter instruction-tuned decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is fine-tuned on instruction data using both chat and raw text formats, excelling at generating coherent text in Swedish, Norwegian, Danish, Icelandic, English, and four programming languages. This model is designed for diverse text generation tasks, including those it was not explicitly trained for, by casting them as text generation problems.
Loading preview...
Model Overview
AI-Sweden-Models/gpt-sw3-20b-instruct is a 20 billion parameter instruction-tuned language model developed by AI Sweden, in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-SW3 collection of decoder-only transformer models, pretrained on a substantial dataset of 320 billion tokens. This dataset includes content in Swedish, Norwegian, Danish, Icelandic, English, and various programming languages, utilizing a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.
Key Capabilities
- Multilingual Text Generation: Capable of generating coherent text across five natural languages (Swedish, Norwegian, Danish, Icelandic, English) and four programming languages.
- Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks by interpreting them as generation prompts.
- Autoregressive Design: Generates text sequentially, making it suitable for conversational AI, content creation, and code assistance.
Performance & Limitations
Evaluations on the Open LLM Leaderboard show an average score of 38.19, with specific metrics including 43.17 on ARC (25-shot) and 71.09 on HellaSwag (10-shot). Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and potential for hallucination. Users should be aware that the model may produce incorrect, irrelevant, or inappropriate content, reflecting biases present in its training data.
Intended Use
This model is intended for applications requiring text generation in its supported languages and for tasks that can be framed as instruction-based text generation. It is suitable for developers looking to integrate a robust, multilingual LLM into their projects, particularly those with a focus on Nordic languages.