dxv2k/gemma-4-E4B-it-merged
dxv2k/gemma-4-E4B-it-merged is a 7.9 billion parameter language model based on the Google Gemma-4-E4B-it architecture, enhanced by merging a LoRA adapter into its base weights. This model is specifically configured for instruction-following tasks and requires a GPU with at least 24 GB VRAM for inference. It is designed for developers seeking a Gemma-4 variant with integrated LoRA for improved performance in conversational and text generation applications.
Loading preview...
Model Overview
This model, dxv2k/gemma-4-E4B-it-merged, is a 7.9 billion parameter variant of the google/gemma-4-E4B-it architecture. It incorporates a LoRA adapter, which has been merged directly into the base weights. The LoRA adapter targets the q/k/v/o/gate/up/down layers of the language model with r=16 and alpha=32, enhancing its capabilities for instruction-tuned tasks.
Key Characteristics
- Architecture: Based on Google's Gemma-4-E4B-it.
- Parameter Count: 7.9 billion parameters.
- Precision: Uses
bf16(bfloat16) for its weights. - Context Length: Supports a context length of 32768 tokens.
- LoRA Integration: Features a merged LoRA adapter for improved performance without external adapter loading.
Deployment and Usage
- Inference Requirements: Requires a GPU with at least 24 GB VRAM due to its
bf16size (approximately 16 GB). - Custom Handler: Utilizes a custom
handler.pyfor inference, as the Gemma 4 architecture is new and requires specific handling. - Dependencies: Pins
transformers>=5.12.1for compatibility. - Local Use: Provides clear Python code examples for local inference using
transformerslibrary, demonstrating how to apply chat templates and generate responses.