CYFRAGOVPL/Llama-PLLuM-70B-base-2412

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Feb 6, 2025License:llama3.1Architecture:Transformer Cold

CYFRAGOVPL/Llama-PLLuM-70B-base-2412 is a 70 billion parameter base model from the PLLuM family, developed by a consortium of Polish scientific institutions led by Politechnika Wrocławska. This model is specialized in Polish and other Slavic/Baltic languages, built upon the Llama 3.1 architecture and continued-pretrained on up to 150 billion tokens of Polish text. It excels in general language tasks and is particularly effective for domain-specific applications in Polish public administration, achieving state-of-the-art results in Polish-language benchmarks.

Loading preview...

PLLuM: Polish Large Language Models

CYFRAGOVPL/Llama-PLLuM-70B-base-2412 is a 70 billion parameter base model within the PLLuM family, developed by a consortium of Polish scientific institutions. It is built on the Llama 3.1 architecture and is specialized for Polish and other Slavic/Baltic languages, with additional English data for broader generalization. The model was continued-pretrained on extensive, high-quality Polish text corpora (up to 150 billion tokens).

Key Capabilities

  • Multilingual Specialization: Strong performance in Polish and other Slavic/Baltic languages.
  • High-Quality Training Data: Utilizes large-scale, cleaned, and deduplicated Polish text data.
  • Advanced Alignment: Refined through an organic instruction dataset of ~40k Polish prompt-response pairs and the first Polish-language preference corpus, enhancing correctness, balance, and safety.
  • Domain-Specific Excellence: Achieves top scores on custom benchmarks relevant to Polish public administration tasks.

What Makes This Model Different?

Unlike many general-purpose LLMs, PLLuM models are specifically engineered for the nuances of the Polish language and its related linguistic contexts. The development includes unique, manually curated Polish instruction and preference datasets, which mitigate negative linguistic transfer and ensure high-quality, contextually appropriate responses. Its strong performance in Polish public administration tasks highlights its specialized utility.

Should I use this for my use case?

  • Yes, if: Your application requires strong performance in Polish language generation, summarization, or question answering. It is particularly well-suited for tasks related to Polish public administration, legal, or bureaucratic domains, especially when combined with RAG. Researchers and developers building downstream AI applications where a robust command of Polish is essential will find this model valuable.
  • Consider alternatives if: Your primary use case is exclusively in English or other languages where more specialized models exist, or if you require a model with a fully open-source license for commercial use that was trained on the smaller 28B token dataset (this specific 70B model uses the Llama 3.1 license and was trained on the larger 150B token dataset).