Overview
NYTK/PULI-Trio-Q is a 7.62 billion parameter language model built upon the Qwen2.5 7B Instruct architecture. Developed by NYTK, this model has undergone significant continual pretraining specifically for the Hungarian language, utilizing a diverse dataset that includes 8.08 billion words from Hungarian documents and Wikipedia. It also incorporated English (Long Context QA, BookSum) and Chinese (Wudao) datasets during its initial pretraining phase, followed by a dedicated Hungarian-only training epoch on 626 million words.
Key Capabilities
- Hungarian Language Proficiency: Optimized for understanding and generating text in Hungarian through extensive domain-specific pretraining.
- Long Context Handling: Supports a maximum sequence length of 32,768 tokens, enabling processing of lengthy documents and complex queries.
- Qwen2.5 Foundation: Benefits from the robust architecture of the Qwen2.5 model family.
Use Cases
- Hungarian Text Generation: Ideal for applications requiring high-quality text generation in Hungarian.
- Long-form Document Analysis: Suitable for tasks involving summarization, question answering, or information extraction from long Hungarian texts.
- Research and Development: A valuable resource for researchers working on Hungarian natural language processing tasks. If you use this model, please cite the associated paper: PULI Chat: Our First Hungarian Conversational Model.