AI-Sweden-Models/gpt-sw3-1.3b-instruct

Cold
Public
1.4B
BF16
2048
License: other
Hugging Face
Overview

AI-Sweden-Models/gpt-sw3-1.3b-instruct Overview

This model is a 1.4 billion parameter instruction-tuned variant of the GPT-SW3 series, developed by AI Sweden in collaboration with RISE and the WASP WARA for Media and Language. It is a decoder-only transformer pretrained using a causal language modeling objective with the NeMo Megatron GPT implementation.

Key Capabilities

  • Multilingual Text Generation: Capable of generating coherent text in five languages: Swedish, Norwegian, Danish, Icelandic, and English.
  • Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Fine-tuned on instruction data (both chat and raw text formats) to perform various text tasks without explicit training, by casting them as text generation problems.
  • Nordic Language Focus: Trained on a substantial dataset of 320 billion tokens, with a significant portion dedicated to Nordic languages, making it highly relevant for applications in these regions.

Performance Metrics

Evaluations on the Open LLM Leaderboard show an average score of 30.26, with specific results including:

  • ARC (25-shot): 30.97
  • HellaSwag (10-shot): 51.42
  • MMLU (5-shot): 26.17
  • TruthfulQA (0-shot): 40.31
  • Winogrande (5-shot): 56.75

Limitations

Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and hallucination. It may overrepresent certain viewpoints, contain stereotypes, and generate inappropriate or incorrect content. Users should be aware of these risks and implement appropriate safeguards.

Good for

  • Research and Evaluation: Primary intended use for research and evaluation of LLM capabilities, especially for Nordic languages.
  • Text Generation in Nordic Languages: Ideal for applications requiring text generation or understanding in Swedish, Norwegian, Danish, and Icelandic.
  • Instruction-based Tasks: Suitable for tasks that can be framed as text generation based on instructions, such as question answering or summarization.
  • Developers in the Nordic NLP Ecosystem: Particularly useful for organizations and individuals contributing to the validation and testing of models for Nordic NLP.