CYFRAGOVPL/Llama-PLLuM-8B-instruct-2512
The Llama-PLLuM-8B-instruct-2512 is an 8 billion parameter instruction-tuned large language model developed by CYFRAGOVPL, based on Meta's Llama-3.1-8B architecture. It is specialized in Polish language tasks, incorporating extensive Polish and English data, and refined through instruction tuning and preference learning. This model excels at generating contextually coherent Polish text and is particularly effective for general language tasks and domain-specific applications within Polish public administration.
Loading preview...
What is Llama-PLLuM-8B-instruct-2512?
Llama-PLLuM-8B-instruct-2512 is an 8 billion parameter instruction-tuned large language model, part of the PLLuM family developed by CYFRAGOVPL. It is built upon the Llama-3.1-8B base model and is specifically designed for the Polish language, while also incorporating English data for broader generalization. The model's development involved extensive data collection, rigorous cleaning, and deduplication of Polish and English text corpora.
Key Capabilities
- Polish Language Specialization: Developed with a focus on high-quality Polish text data, including a large collection of manually created "organic instructions" and the first Polish-language preference corpus.
- Instruction Tuning: Fine-tuned using approximately 70k manually curated Polish instructions, 33k programmatically derived instructions, 15k RAG-style context-processing instructions, and 45k synthetic, context-aware instructions.
- Alignment and Safety: Utilizes ~60k manually annotated preference pairs to ensure safer, balanced, and contextually appropriate responses, even for sensitive topics.
- Strong Performance: Achieves top scores on custom benchmarks relevant to Polish public administration and state-of-the-art results in broader Polish-language tasks.
Good For
- General Language Tasks: Ideal for text generation, summarization, extraction, and question answering in Polish.
- Domain-Specific Assistants: Particularly effective for applications within Polish public administration, legal, or bureaucratic contexts requiring domain-aware retrieval.
- Research & Development: Serves as a robust foundation for AI applications demanding strong command of the Polish language.