ArnavKewalram/gemma-4-E2B-coder-v1

VISIONConcurrency Cost:1Model Size:5.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 16, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

ArnavKewalram/gemma-4-E2B-coder-v1 is a 3.9 billion parameter instruction-tuned code generation model based on Google's gemma-4-E2B-it, optimized for on-device and offline inference. It achieves 34.1% on HumanEval pass@1, matching Code Llama 7B performance at half the size, and runs on devices with as little as 4 GB RAM. This model excels at generating code in Python, JavaScript, TypeScript, Go, Rust, SQL, Bash, and C++ for laptops and edge devices.

Loading preview...

Overview

ArnavKewalram/gemma-4-E2B-coder-v1 is the first coding fine-tune of Google's gemma-4-E2B-it, a 3.9 billion parameter model. It is specifically designed for efficient, offline code generation on resource-constrained devices, running on as little as 4 GB RAM without a GPU. The model leverages the Griffin architecture, which combines local-attention and linear recurrent layers for lower latency compared to pure-transformer models of similar size.

Key Capabilities

  • Code Generation: Excels in Python, JavaScript, TypeScript, Go, Rust, SQL, Bash, and C++.
  • High Performance: Achieves 34.1% HumanEval pass@1, comparable to Code Llama 7B, despite being significantly smaller.
  • Resource Efficient: Quantized versions (e.g., Q4_K_M at ~3.2 GB) run on CPUs and edge devices with 4 GB RAM.
  • Commercial Use: Licensed under Apache 2.0, allowing unrestricted commercial applications.
  • Real-world Training: Fine-tuned on 10,000 samples from the Magicoder-OSS-Instruct-75K dataset, comprising real open-source code instruction pairs from GitHub.

Good For

  • Developers needing a capable coding assistant that operates fully offline.
  • Applications requiring fast CPU inference on laptops or edge devices.
  • Projects with strict memory constraints (e.g., 4 GB RAM minimum).
  • Commercial products due to its permissive Apache 2.0 license.

Limitations

  • Context Length: Trained with a maximum sequence length of 384 tokens, which may affect performance on very long code generation tasks.
  • Not evaluated for security-critical code generation.
  • Inherits biases and knowledge cutoff from its base model.