Model Overview

This model, gemma4-e2b-colloquial-ru-merged, is a full-weight checkpoint combining the base model google/gemma-4-E2B-it with a colloquial Russian LoRA adapter. It is specifically designed for efficient inference on GPUs using systems like vLLM and RunPod Serverless, eliminating the need for PEFT at inference time.

Key Capabilities

Style Transfer: Rewrites formal Russian text into a colloquial style.
Content Preservation: Ensures facts, names, numbers, and structural elements (paragraphs, lists) are maintained during style transformation.
No Profanity: Designed to produce conversational text without using offensive language.
Optimized for Deployment: Merged weights are suitable for direct deployment in vLLM and RunPod Serverless environments.

Training Details

The model was trained using approximately 10,000 SFT (Supervised Fine-Tuning) pairs from a mixed corpus including Telegram and social media data. LoRA (Low-Rank Adaptation) was applied to the language tower (r=16, alpha=16) and subsequently merged into the full weights. The checkpoint includes k_norm for layers 15-34 to ensure compatibility with vLLM.

Usage Scenarios

vLLM/RunPod Serverless: Direct deployment for high-throughput inference.
OpenAI-compatible API: Can be accessed via a local proxy for integration into applications.
Streamlit UI: A provided Docker Compose setup allows for a local Streamlit UI for interactive use.

Limitations

Subject to Gemma's license.
Not intended for production use without independent quality and safety assessment.
Minor stylistic differences may exist between merged and LoRA-based inference.

Overview

Model Overview

Key Capabilities

Training Details

Usage Scenarios

Limitations

Full Model Card (README)