OPI-PG/Qra-7b

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 27, 2024License:llama2Architecture:Transformer0.0K Open Weights Cold

OPI-PG/Qra-7b is a 7 billion parameter causal language model developed by OPI and PG, adapted from Llama 2 checkpoints. It is specifically trained on a 90 billion token corpus of Polish texts, making it highly optimized for Polish language processing. This foundation model has a 4096-token context length and excels in perplexity benchmarks on Polish datasets, outperforming other Polish and English LLMs.

Loading preview...

OPI-PG/Qra-7b: A Foundation Model for Polish Language Processing

OPI-PG/Qra-7b is a 7 billion parameter large language model developed through a collaboration between the National Information Processing Institute (OPI) and Gdańsk University of Technology (PG). This model is adapted from Llama 2-7b-hf and has been extensively trained on a meticulously cleaned, filtered, and deduplicated corpus of approximately 90 billion Polish tokens, primarily sourced from web data including CommonCrawl and MADLAD-400.

Key Characteristics & Training

  • Polish Language Focus: Specifically designed and trained for the Polish language, making it highly proficient in Polish text generation and understanding.
  • Robust Preprocessing: The training data underwent rigorous preprocessing, including text normalization, URL removal, document filtering based on length and quality classifiers, language identification, and fuzzy deduplication within 18 topical domains.
  • Technical Optimizations: Trained for one epoch on 4096-token sequences, utilizing advanced optimizations such as torch.compile, adamw_apex_fused optimizer, Flash Attention 2, mixed precision, gradient accumulation, and FSDP.

Performance & Evaluation

Qra-7b demonstrates strong performance in perplexity benchmarks on Polish texts:

  • PolEval-2018: Achieved a perplexity of 11.3, significantly outperforming other Polish models like szymonrucinski/Curie-7B-v1 (13.5) and English models like meta-llama/Llama-2-7b-hf (24.3).
  • Long Documents (2024): Showed a perplexity of 4.5 on a new dataset of long Polish documents (news and scientific articles from 2024), surpassing szymonrucinski/Curie-7B-v1 (4.8) and meta-llama/Llama-2-7b-hf (5.9).

Important Note

Qra models are foundation language models trained with a causal language modeling objective. They are not intended for conversational or instruction-following purposes out-of-the-box and require further fine-tuning for such applications.