GPT-Sw3 356M: A Multilingual Nordic LLM
GPT-Sw3 356M is a 0.5 billion parameter decoder-only transformer language model developed by AI Sweden, in collaboration with RISE and WASP WARA for Media and Language. It is part of the broader GPT-Sw3 collection, which aims to advance large language models for Nordic languages.
Key Capabilities
- Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
- Code Generation: Supports text generation in four programming languages.
- Instruction Following: Can perform various text tasks by rephrasing them as text generation prompts, even if not explicitly trained for them.
- Nordic Language Focus: Trained on a substantial dataset of 320 billion tokens, with significant emphasis on Nordic languages alongside English and programming code.
Intended Use and Limitations
This model is released for research and evaluation within the Nordic NLP ecosystem to gather feedback on its performance and identify areas for improvement. Like other large language models, GPT-Sw3 356M has limitations, including potential biases, safety concerns, generation diversity issues, and hallucination. Users should be aware that the model may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. It may also produce factual errors or irrelevant outputs. The model's training data includes public Common Crawl, Reddit, Familjeliv, and Flashback, which may contain offensive or sensitive content.