AI-Sweden-Models/gpt-sw3-126m

Cold
Public
0.2B
BF16
2048
License: other
Hugging Face
Overview

Overview

AI Sweden, in collaboration with RISE and WASP WARA for Media and Language, developed the GPT-Sw3 126M, a decoder-only transformer language model. It is part of the GPT-SW3 series, which includes various base and instruct models. This model was pretrained using a causal language modeling objective with the NeMo Megatron GPT implementation.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Can be instructed to perform text-based tasks not explicitly trained for by framing them as text generation problems.
  • Research and Evaluation: Primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages.

Training Data

The model was trained on a diverse dataset of 320 billion tokens (1.1TB UTF-8 encoded text) collected between June 2021 and June 2022. This dataset includes a wide range of sources such as books (Litteraturbanken, The Pile), articles (Diva, PubMed, ArXiv), code (Code Parrot: Github code), conversational data (Familjeliv, Flashback, Reddit), math datasets, and extensive web crawls (Common Crawl, Wikipedia, public Swedish website scrapes). The data was filtered and deduplicated using methods inspired by The BigScience ROOTS Corpus and Gopher.

Limitations

Like other large language models, GPT-Sw3 126M has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these risks and consider the model's output critically, especially given its modified RAIL license which encourages transparency and study of LLM issues.