AI-Sweden-Models/gpt-sw3-40b

TEXT GENERATIONConcurrency Cost:3Model Size:39.9BQuant:FP8Ctx Length:32kPublished:Feb 22, 2023License:otherArchitecture:Transformer0.0K Gated Cold

AI-Sweden-Models/gpt-sw3-40b is a 40 billion parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. Trained on 320 billion tokens across Swedish, Norwegian, Danish, Icelandic, English, and programming code, it excels at generating coherent text in five languages and four programming languages. This model is designed for autoregressive text generation and can perform various text tasks by casting them as generation problems.

Loading preview...

Model Overview

AI-Sweden-Models/gpt-sw3-40b is a 40 billion parameter decoder-only transformer language model developed by AI Sweden, RISE, and WASP WARA for Media and Language. It is part of the GPT-SW3 collection, which includes various sizes and instruction-tuned variants. The model was pretrained using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Task Adaptability: Can perform diverse text tasks by framing them as text generation problems, even if not explicitly trained for them.
  • Extensive Training Data: Trained on a substantial dataset of 320 billion tokens, ensuring broad language coverage.

Intended Use Cases

This model is suitable for applications requiring robust text generation across multiple Nordic languages and English, as well as for programming-related text tasks. Its autoregressive nature makes it versatile for various natural language processing applications.

Limitations

Like other large language models, GPT-SW3 has limitations including potential biases, safety concerns, and quality issues such as hallucination and lack of generation diversity. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these limitations and exercise caution in deployment.