Name: AI-Sweden-Models/gpt-sw3-126m API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: AI-Sweden-Models

GPT-SW3 126M: A Multilingual and Multicode Base Model

AI-Sweden-Models/gpt-sw3-126m is a 126 million parameter decoder-only transformer model, part of the GPT-SW3 series developed by AI Sweden, RISE, and WASP WARA for Media and Language. It was pretrained using a causal language modeling (CLM) objective with the NeMo Megatron GPT implementation.

Key Capabilities

Multilingual Text Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
Multicode Generation: Supports text generation in four different programming languages.
Instruction Following: Can perform various text tasks by interpreting them as text generation prompts, even if not explicitly trained for them.
Broad Training Data: Trained on a substantial 320 billion token dataset, covering both human languages and programming code.

Limitations

Like other large language models, GPT-SW3 126M has limitations including potential biases, safety concerns, and issues with generation diversity and hallucination. It may overrepresent certain viewpoints, contain stereotypes, or generate inappropriate content. Users should be aware that the model can produce incorrect information or irrelevant outputs.

Good for

Generating text in Nordic languages and English.
Basic code generation tasks.
Exploring small-scale multilingual and multicode LLM applications.
Research into the biases and safety aspects of multilingual models.

Overview

GPT-SW3 126M: A Multilingual and Multicode Base Model

Key Capabilities

Limitations

Good for

Full Model Card (README)