CYFRAGOVPL/Llama-PLLuM-8B-base-2508
The CYFRAGOVPL/Llama-PLLuM-8B-base-2508 is an 8 billion parameter base language model from the PLLuM family, developed by the HIVE AI consortium. Specialized for Polish and other Slavic/Baltic languages, it incorporates additional English data for broader generalization. This model is built on a high-quality, legally compliant Polish text corpus and is designed to generate contextually coherent text, excelling in tasks relevant to Polish public administration and serving as a foundation for specialized applications.
Loading preview...
PLLuM: A Family of Polish Large Language Models
CYFRAGOVPL/Llama-PLLuM-8B-base-2508 is an 8 billion parameter base model developed by the HIVE AI consortium, specializing in Polish and other Slavic/Baltic languages. It is built upon a meticulously curated corpus of approximately 18 billion tokens, with 17 billion in Polish and 1 billion in English, ensuring legal compliance and high quality. The model leverages an extensive Polish instruction dataset (~55k prompt-response pairs) and the first Polish-language preference corpus to enhance correctness, balance, and safety.
Key Capabilities
- Multilingual Specialization: Optimized for Polish, Slavic, and Baltic languages, with English data for broader generalization.
- High-Quality Data: Trained on a legally compliant, high-quality Polish text corpus (150B tokens after cleaning, 28B commercially usable).
- Advanced Alignment: Utilizes a unique organic instruction dataset and a Polish preference corpus for robust supervised fine-tuning and alignment.
- Strong Performance: Achieves state-of-the-art results in Polish-language tasks and top scores on custom benchmarks for Polish public administration.
Good For
- General Language Tasks: Text generation, summarization, and question answering in Polish.
- Domain-Specific Assistants: Particularly effective for applications in Polish public administration, legal, and bureaucratic contexts.
- Research & Development: Serving as a foundational model for AI applications requiring strong Polish language capabilities.