GPT-SW3 356M Instruct: Multilingual Instruction-Tuned LLM
This model is a 0.5 billion parameter instruction-tuned variant from the GPT-SW3 series, developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is a decoder-only transformer pretrained on a substantial dataset of 320 billion tokens.
Key Capabilities
- Multilingual Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
- Code Generation: Supports text generation in four programming languages.
- Instruction Following: Fine-tuned on diverse instruction data (chat and raw text formats) to perform various text-based tasks.
- Autoregressive Text Generation: Designed for causal language modeling, generating text based on given prompts.
Training and Data
The model was pretrained using the NeMo Megatron GPT implementation. Its training data includes a broad mix of sources such as books, articles, code (Code Parrot: Github code), conversational data (Familjeliv, Flashback, Reddit), mathematical datasets, and extensive web crawls (Multilingual C4, OSCAR, Wikipedia). The instruction tuning utilized datasets like Dolly, Open Assistant, OIG, and a Swedish pharmaceutical Q&A dataset (Fass).
Intended Use Cases
This model is primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages. It can be used by organizations and individuals in the Nordic NLP ecosystem to validate and test LLM capabilities and provide feedback.
Limitations
Like other large language models, GPT-SW3 356M Instruct has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these risks and implement appropriate safeguards.