google/gemma-4-E2B-it

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer0.5K Open Weights Warm

Gemma 4 E2B-it is a 2.3 billion effective parameter instruction-tuned multimodal language model developed by Google DeepMind. Part of the Gemma 4 family, it handles text, image, and audio inputs, generating text outputs. Optimized for on-device deployment, it features a 128K context window and excels in reasoning, coding, and agentic capabilities.

Loading preview...

Model Overview

google/gemma-4-E2B-it is an instruction-tuned variant from the Gemma 4 family of open multimodal models by Google DeepMind. This model is specifically designed for efficient local execution on devices like high-end phones and laptops, featuring 2.3 billion effective parameters and a 128K token context window. It supports text, image, and audio inputs, generating text outputs, and incorporates Per-Layer Embeddings (PLE) for parameter efficiency.

Key Capabilities

  • Multimodality: Processes text, images (with variable aspect ratio and resolution), and audio inputs (native to E2B/E4B models).
  • Reasoning: Includes a built-in reasoning mode for step-by-step thinking.
  • Long Context: Supports a 128K token context window.
  • Coding & Agentic: Enhanced coding benchmarks and native function-calling support for autonomous agents.
  • Multilingual: Pre-trained on over 140 languages with out-of-the-box support for 35+ languages.
  • On-Device Optimization: Smaller models (E2B, E4B) are specifically optimized for efficient local execution.

When to Use This Model

  • On-device applications: Ideal for deployment on mobile devices and laptops due to its optimized size and efficiency.
  • Multimodal tasks: Excellent for applications requiring understanding and generation based on combined text, image, and audio inputs.
  • Reasoning and coding: Suitable for tasks that benefit from structured reasoning and code generation/completion.
  • Agentic workflows: Supports native function calling, making it a strong candidate for building autonomous agents.