AI-Sweden-Models/gpt-sw3-1.3b-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:1.4BQuant:BF16Ctx Length:2kPublished:Apr 28, 2023License:otherArchitecture:Transformer0.0K Gated Cold

The AI-Sweden-Models/gpt-sw3-1.3b-instruct is a 1.3 billion parameter instruction-tuned decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA. It is trained on a 320 billion token dataset comprising Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model excels at generating coherent text in five languages and four programming languages, and can perform various text tasks through instruction-based generation.

Loading preview...

Overview

AI-Sweden-Models/gpt-sw3-1.3b-instruct is a 1.3 billion parameter instruction-tuned language model from the GPT-SW3 family, developed by AI Sweden, RISE, and WASP WARA. It is a decoder-only transformer pretrained using a causal language modeling objective with NeMo Megatron GPT implementation. The model was fine-tuned on instruction data, including both chat and raw text formats, to enhance its ability to follow commands.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Multilingual Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Designed to perform various text tasks by interpreting instructions, even for tasks not explicitly trained on, by framing them as text generation problems.
  • Broad Training Data: Pretrained on a substantial 320 billion token dataset covering multiple Nordic languages, English, and programming code.

Limitations and Considerations

Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware that the model can produce incorrect information or irrelevant outputs. The model is released under a modified RAIL license to encourage transparency and study of its characteristics.

Usage

This model is suitable for applications requiring text generation and instruction-following in a multilingual context, particularly for Nordic languages and English, as well as code-related tasks. Access requires Hugging Face authentication due to its private repository status.