AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct

Cold
Public
7.1B
FP8
2048
License: other
Hugging Face
Overview

Overview

AI-Sweden-Models/gpt-sw3-6.7b-v2-instruct is a 7.1 billion parameter decoder-only transformer language model, part of the GPT-SW3 collection. Developed by AI Sweden, RISE, and WASP WARA for Media and Language, this model is specifically instruction-tuned for diverse text generation tasks.

Key Capabilities

  • Multilingual Generation: Capable of generating coherent text in Swedish, Norwegian, Danish, Icelandic, and English.
  • Multilingual Code Generation: Supports text generation in four programming languages.
  • Instruction Following: Fine-tuned on instruction data (chat and raw text formats) to perform tasks not explicitly trained for by casting them as text generation.
  • Extensive Training Data: Pretrained on a 320 billion token dataset, including a significant portion of Nordic languages and programming code.

Performance Highlights

Evaluations on the Open LLM Leaderboard show an average score of 39.57, with specific scores including:

  • HellaSwag (10-shot): 67.77
  • Winogrande (5-shot): 63.54
  • ARC (25-shot): 40.78

Intended Use

This model is primarily intended for research and evaluation of Large Language Models, particularly for Nordic languages. It aims to facilitate knowledge building and gather feedback from the Nordic NLP ecosystem.

Limitations

Like other large language models, GPT-SW3 has limitations regarding bias, safety, generation diversity, and hallucination. It may produce stereotypes, hateful or discriminatory language, and factual errors. Users should be aware of these potential issues and consider appropriate disclaimers or content filtering.