AI-Sweden-Models/gpt-sw3-126m-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:0.2BQuant:BF16Ctx Length:2kPublished:Apr 28, 2023License:otherArchitecture:Transformer0.0K Gated Cold

The AI-Sweden-Models/gpt-sw3-126m-instruct is a 126 million parameter instruction-tuned decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-SW3 collection, pretrained on 320 billion tokens across Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text in multiple languages and performing various text tasks through instruction-following, making it suitable for Nordic NLP applications.

Loading preview...

Model Overview

AI Sweden, in collaboration with RISE and WASP WARA for Media and Language, developed the GPT-SW3 series, which includes this 126 million parameter instruction-tuned model. The GPT-SW3 models are decoder-only transformer language models pretrained on a substantial dataset of 320 billion tokens. This dataset encompasses five languages: Swedish, Norwegian, Danish, Icelandic, English, and also includes programming code.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Fine-tuned on instruction data (both chat and raw text formats) to perform various text tasks not explicitly trained for, by casting them as text generation tasks.

Intended Use and Limitations

GPT-SW3 models are released in a controlled pre-release to foster validation and feedback within the Nordic NLP ecosystem. As with other large language models, GPT-SW3 has limitations, including potential biases, safety concerns, quality issues in generation diversity, and hallucination. Users should be aware that the model may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. It may also produce incorrect information or irrelevant outputs.