pavelfedortsov/gemma4-e2b-colloquial-ru-merged

VISIONConcurrency Cost:1Model Size:5.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:May 23, 2026License:gemmaArchitecture:Transformer Cold

The pavelfedortsov/gemma4-e2b-colloquial-ru-merged model is a 5.1 billion parameter language model based on Google's Gemma-4-E2B-it architecture, fine-tuned for Russian colloquial text generation. It specializes in transforming formal Russian text into a conversational style while preserving factual content. With a 32768 token context length, this model is optimized for deployment in environments like vLLM and RunPod Serverless for style transfer tasks.

Loading preview...

Model Overview

This model, gemma4-e2b-colloquial-ru-merged, is a full-weight checkpoint combining the base model google/gemma-4-E2B-it with a colloquial Russian LoRA adapter. It is specifically designed for efficient inference on GPUs using systems like vLLM and RunPod Serverless, eliminating the need for PEFT at inference time.

Key Capabilities

  • Style Transfer: Rewrites formal Russian text into a colloquial style.
  • Content Preservation: Ensures facts, names, numbers, and structural elements (paragraphs, lists) are maintained during style transformation.
  • No Profanity: Designed to produce conversational text without using offensive language.
  • Optimized for Deployment: Merged weights are suitable for direct deployment in vLLM and RunPod Serverless environments.

Training Details

The model was trained using approximately 10,000 SFT (Supervised Fine-Tuning) pairs from a mixed corpus including Telegram and social media data. LoRA (Low-Rank Adaptation) was applied to the language tower (r=16, alpha=16) and subsequently merged into the full weights. The checkpoint includes k_norm for layers 15-34 to ensure compatibility with vLLM.

Usage Scenarios

  • vLLM/RunPod Serverless: Direct deployment for high-throughput inference.
  • OpenAI-compatible API: Can be accessed via a local proxy for integration into applications.
  • Streamlit UI: A provided Docker Compose setup allows for a local Streamlit UI for interactive use.

Limitations

  • Subject to Gemma's license.
  • Not intended for production use without independent quality and safety assessment.
  • Minor stylistic differences may exist between merged and LoRA-based inference.