lapa-llm/lapa-12b-pt

Warm
Public
Vision
12B
FP8
32768
Sep 23, 2025
License: gemma
Hugging Face
Overview

Lapa LLM: Optimized for Ukrainian Language Processing

Lapa LLM is a 12 billion parameter open large language model built upon Gemma-3-12B, developed by a collaborative team of Ukrainian researchers. Its core innovation lies in its deep optimization for the Ukrainian language, making it a leading model in this domain.

Key Capabilities & Differentiators

  • Superior Ukrainian Tokenization: Utilizes a state-of-the-art tokenizer adaptation method, replacing 80,000 tokens with Ukrainian ones without quality loss. This results in 1.5 times fewer tokens and three times fewer computations for Ukrainian text compared to the original Gemma 3.
  • High Performance in Ukrainian Benchmarks: Achieves 33 BLEU for English-to-Ukrainian translation on FLORES, making it highly effective for translating NLP datasets. It also ranks among the best for image processing (MMZNO benchmark), summarization, and Q&A in Ukrainian.
  • Robust Pretraining: Demonstrates strong performance in Ukrainian language pretraining benchmarks, with training data evaluated for quality, readability, grammar, and propaganda/disinformation presence. Includes high-quality materials from Harvard Library.
  • Multimodal Support: Capable of processing both text and images, with image inputs normalized to 896x896 resolution and encoded to 256 tokens each. Supports a total input context of 32K tokens.
  • Maximum Openness: The project emphasizes transparency, offering the model for commercial use, publishing approximately 25 training datasets, disclosing data filtering methods, and providing open-source code and training documentation.

Intended Use Cases

Lapa LLM is particularly well-suited for applications requiring robust Ukrainian language understanding and generation, as well as multimodal capabilities:

  • Content Creation: Generating text in Ukrainian, including creative formats, marketing copy, and email drafts.
  • Translation: Efficient and natural English-to-Ukrainian and Ukrainian-to-English translation.
  • Chatbots & Conversational AI: Powering Ukrainian-language conversational interfaces.
  • Text Summarization & Q&A: Excelling in summarizing Ukrainian texts and answering questions, beneficial for RAG systems.
  • Image Data Extraction: Interpreting and summarizing visual data in Ukrainian contexts.
  • Research & Development: Serving as a foundational model for Ukrainian NLP and VLM research due to its open nature and strong pretraining results.