timpal0l/gpt-sw3-1.3b-instruct
The timpal0l/gpt-sw3-1.3b-instruct is a 1.4 billion parameter instruction-tuned decoder-only transformer language model developed by AI Sweden. It is part of the GPT-SW3 collection, trained on a 320 billion token dataset encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text and performing instruction-based tasks across these five languages and four programming languages.
Loading preview...
Model Overview
The timpal0l/gpt-sw3-1.3b-instruct is a 1.4 billion parameter instruction-tuned model from the GPT-SW3 series, developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is a decoder-only transformer pretrained on a substantial 320 billion token dataset. This dataset is unique for its extensive coverage of Nordic languages (Swedish, Norwegian, Danish, Icelandic) alongside English and programming code.
Key Capabilities
- Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
- Code Generation: Supports text generation in four programming languages.
- Instruction Following: Fine-tuned on instruction data, enabling it to perform various text tasks when prompted, even those not explicitly trained for, by framing them as text generation tasks.
- Nordic Language Focus: Specifically designed to address the need for large language models in Nordic languages.
Training Details
The model was pretrained using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation. The instruction-tuned variants, like this one, were further fine-tuned using both chat and raw text instruction formats. The training data includes diverse sources such as books, articles, code (from GitHub), conversational data (e.g., Reddit, Familjeliv), mathematical datasets, and extensive web crawls (Common Crawl, Wikipedia).
Limitations
Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these potential issues and implement appropriate safeguards.