AI-Sweden-Models/gpt-sw3-356m
AI-Sweden-Models/gpt-sw3-356m is a 356 million parameter decoder-only transformer language model developed by AI Sweden, RISE, and WASP WARA for Media and Language. It is part of the GPT-SW3 collection, pretrained on a 320 billion token dataset encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text across five languages and four programming languages, capable of performing various text tasks through causal language modeling.
Loading preview...
Overview
AI-Sweden-Models/gpt-sw3-356m is a 356 million parameter model from the GPT-SW3 family, developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is a decoder-only transformer pretrained using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation. The model was trained on a substantial 320 billion token dataset, which includes content in Swedish, Norwegian, Danish, Icelandic, English, and various programming languages.
Key Capabilities
- Multilingual Text Generation: Capable of generating coherent text in five distinct languages: Swedish, Norwegian, Danish, Icelandic, and English.
- Code Generation: Supports text generation in four different programming languages.
- Task Adaptability: Can perform various text-based tasks by framing them as text generation problems, even if not explicitly trained for them.
Intended Use and Limitations
GPT-SW3 models are shared in a controlled pre-release to facilitate validation and feedback from the Nordic NLP community. Like other large language models, GPT-SW3 has limitations, including potential biases, safety concerns, and issues with generation diversity and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware that the model can produce incorrect information or irrelevant outputs. The model is released under a modified RAIL license to promote transparency and further study of LLMs.