OPI-PG/Qra-1b

TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Feb 26, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OPI-PG/Qra-1b is a 1.1 billion parameter causal language model developed by the National Information Processing Institute (OPI) and Gdańsk University of Technology (PG). Adapted from TinyLlama-1.1B, it was further trained on a 90 billion token Polish text corpus, making it highly optimized for Polish language understanding and generation. This foundation model is designed for tasks requiring strong Polish language capabilities, with a context length of 4096 tokens.

Loading preview...

OPI-PG/Qra-1b: A Polish-Optimized Foundation Model

OPI-PG/Qra-1b is a 1.1 billion parameter large language model developed collaboratively by the National Information Processing Institute (OPI) and Gdańsk University of Technology (PG). It is part of the Qra series, specifically adapted for the Polish language.

Key Characteristics & Training:

  • Base Model: Initialized from TinyLlama-1.1B checkpoints.
  • Polish Data Training: Further trained on a meticulously cleaned, filtered, and deduplicated corpus of approximately 90 billion Polish tokens, primarily sourced from web data like CommonCrawl and MADLAD-400.
  • Data Preprocessing: Utilized a robust pipeline including text normalization, removal of short documents, heuristic sentence cleaning, quality classification, perplexity-based filtering, topical domain assignment, and fuzzy deduplication.
  • Technical Optimizations: Trained with modern techniques such as torch.compile, adamw_apex_fused optimizer, Flash Attention 2, mixed precision, gradient accumulation, and FSDP.
  • Context Length: Supports a context length of 4096 tokens.

Performance:

  • PolEval-2018: Achieved a perplexity of 14.7 on the PolEval-2018 test set, outperforming many other Polish and English models in its size class.
  • Long Documents (2024): Demonstrated a perplexity of 6.1 on a new dataset of long Polish documents from 2024, indicating strong performance on contemporary and extended texts.

Important Note:

Qra-1b is a foundation language model trained with a causal language modeling objective. It is not intended for conversational or instruction-following tasks out-of-the-box and requires further fine-tuning for such applications.