AI-Sweden-Models/gpt-sw3-356m
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:2kPublished:Dec 14, 2022License:otherArchitecture:Transformer0.0K Cold

GPT-Sw3 356M is a 0.5 billion parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. Trained on 320 billion tokens across Swedish, Norwegian, Danish, Icelandic, English, and programming code, it generates coherent text in five languages and four programming languages. This model is part of a collection focused on advancing large language models for Nordic languages, with a context length of 2048 tokens.

Loading preview...

GPT-Sw3 356M: A Multilingual Nordic LLM

GPT-Sw3 356M is a 0.5 billion parameter decoder-only transformer language model developed by AI Sweden, in collaboration with RISE and WASP WARA for Media and Language. It is part of the broader GPT-Sw3 collection, which aims to advance large language models for Nordic languages.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Can perform various text tasks by rephrasing them as text generation prompts, even if not explicitly trained for them.
  • Nordic Language Focus: Trained on a substantial dataset of 320 billion tokens, with significant emphasis on Nordic languages alongside English and programming code.

Intended Use and Limitations

This model is released for research and evaluation within the Nordic NLP ecosystem to gather feedback on its performance and identify areas for improvement. Like other large language models, GPT-Sw3 356M has limitations, including potential biases, safety concerns, generation diversity issues, and hallucination. Users should be aware that the model may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. It may also produce factual errors or irrelevant outputs. The model's training data includes public Common Crawl, Reddit, Familjeliv, and Flashback, which may contain offensive or sensitive content.