OPI-PG/Qra-13b

Warm
Public
13B
FP8
4096
1
Feb 27, 2024
License: llama2
Hugging Face

OPI-PG/Qra-13b is a 13 billion parameter causal language model developed by the National Information Processing Institute (OPI) and Gdańsk University of Technology (PG). Adapted from Llama 2, it was further trained on a 90 billion token Polish text corpus, making it highly optimized for Polish language understanding and generation. This foundation model is designed for further fine-tuning for specific downstream tasks, excelling in perplexity benchmarks on Polish texts.

Overview

Overview

OPI-PG/Qra-13b is a 13 billion parameter foundational language model, a collaborative effort between the National Information Processing Institute (OPI) and Gdańsk University of Technology (PG). It was initialized with Llama 2 13B weights and then extensively trained on a meticulously cleaned, filtered, and deduplicated corpus of approximately 90 billion Polish tokens, primarily sourced from web data like CommonCrawl and MADLAD-400. The model was trained for one epoch on sequences of 4096 tokens, utilizing modern optimizations such as Flash Attention 2 and FSDP.

Key Capabilities

  • Polish Language Specialization: Specifically adapted and trained on a vast Polish text corpus, making it highly proficient in Polish language understanding and generation.
  • Foundation Model: Designed as a base model for further fine-tuning, it is not intended for direct conversational or instruction-following tasks without additional adaptation.
  • Strong Perplexity Performance: Achieves a perplexity of 10.5 on the PolEval-2018 dataset and 4.2 on a new 2024 long document dataset, outperforming other Polish and English models in its class on Polish texts.

Good For

  • Developing Polish-centric LLM applications: Ideal for researchers and developers building applications that require deep understanding and generation of Polish text.
  • Further Fine-tuning: Serves as an excellent base for fine-tuning into instruction-following, conversational, or task-specific models for the Polish language.
  • Research in Polish NLP: Useful for academic and industrial research into large language models tailored for less-resourced languages or specific linguistic challenges in Polish.