AI-Sweden-Models/gpt-sw3-356m-instruct
The AI-Sweden-Models/gpt-sw3-356m-instruct is a 356 million parameter instruction-tuned decoder-only transformer model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-SW3 series, pretrained on 320 billion tokens across Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text in five languages and four programming languages, and can perform various text tasks through instruction-based prompting.
Loading preview...
Overview
AI-Sweden-Models/gpt-sw3-356m-instruct is an instruction-tuned variant of the GPT-SW3 356M base model, developed by AI Sweden, RISE, and WASP WARA for Media and Language. This decoder-only transformer model was pretrained on a substantial dataset of 320 billion tokens, encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code. The instruction models, including this 356M version, were fine-tuned using both chat and raw text instruction data, enabling them to follow prompts for diverse text generation tasks.
Key Capabilities
- Multilingual Text Generation: Capable of generating coherent text in five natural languages (Swedish, Norwegian, Danish, Icelandic, English).
- Multilingual Code Generation: Supports text generation in four programming languages.
- Instruction Following: Can perform various text tasks by interpreting instructions, even for tasks not explicitly trained for, by framing them as text generation.
- Autoregressive Language Modeling: Generates text sequentially based on preceding tokens.
Limitations
Like other large language models, GPT-SW3 models exhibit limitations such as potential biases, safety concerns, and quality issues including hallucination and lack of generation diversity. The model may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware of these limitations, which are openly communicated through its modified RAIL license.