AI-Sweden-Models/gpt-sw3-126m

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.2BQuant:BF16Ctx Length:2kPublished:Dec 14, 2022License:otherArchitecture:Transformer0.0K Gated Warm

The AI-Sweden-Models/gpt-sw3-126m is a 126 million parameter decoder-only transformer language model developed by AI Sweden in collaboration with RISE and WASP WARA for Media and Language. It is part of the GPT-SW3 collection, pretrained on a 320 billion token dataset encompassing Swedish, Norwegian, Danish, Icelandic, English, and programming code. This model is designed for generating coherent text across five human languages and four programming languages, and can perform various text tasks through instruction.

Loading preview...

GPT-SW3 126M: A Multilingual and Multicode Base Model

AI-Sweden-Models/gpt-sw3-126m is a 126 million parameter decoder-only transformer model, part of the GPT-SW3 series developed by AI Sweden, RISE, and WASP WARA for Media and Language. It was pretrained using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Multicode Generation: Supports text generation in four different programming languages.
  • Instruction Following: Can perform various text tasks by interpreting them as text generation prompts, even if not explicitly trained for them.
  • Broad Training Data: Trained on a substantial 320 billion token dataset, covering both human languages and programming code.

Limitations

Like other large language models, GPT-SW3 126M has limitations including potential biases, safety concerns, and issues with generation diversity and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware that the model can produce incorrect information or irrelevant outputs.

Good for

  • Generating text in Nordic languages and English.
  • Basic code generation tasks.
  • Exploring small-scale multilingual and multicode LLM applications.
  • Research into the biases and safety aspects of multilingual models.