Name: dxv2k/gemma-4-E4B-it-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dxv2k

Model Overview

This model, dxv2k/gemma-4-E4B-it-merged, is a 7.9 billion parameter variant of the google/gemma-4-E4B-it architecture. It incorporates a LoRA adapter, which has been merged directly into the base weights. The LoRA adapter targets the q/k/v/o/gate/up/down layers of the language model with r=16 and alpha=32, enhancing its capabilities for instruction-tuned tasks.

Key Characteristics

Architecture: Based on Google's Gemma-4-E4B-it.
Parameter Count: 7.9 billion parameters.
Precision: Uses bf16 (bfloat16) for its weights.
Context Length: Supports a context length of 32768 tokens.
LoRA Integration: Features a merged LoRA adapter for improved performance without external adapter loading.

Deployment and Usage

Inference Requirements: Requires a GPU with at least 24 GB VRAM due to its bf16 size (approximately 16 GB).
Custom Handler: Utilizes a custom handler.py for inference, as the Gemma 4 architecture is new and requires specific handling.
Dependencies: Pins transformers>=5.12.1 for compatibility.
Local Use: Provides clear Python code examples for local inference using transformers library, demonstrating how to apply chat templates and generate responses.

Overview

Model Overview

Key Characteristics

Deployment and Usage

Full Model Card (README)