unsloth/gemma-4-E2B

VISIONConcurrency Cost:1Model Size:5.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 31, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The unsloth/gemma-4-E2B model is a 5.1 billion effective parameter multimodal language model developed by Google DeepMind, part of the Gemma 4 family. It handles text, image, and audio inputs, generating text outputs, and features a 128K token context window. Optimized for on-device deployment, it excels in reasoning, agentic workflows, and coding tasks, with native function-calling support.

Loading preview...

Overview of Gemma 4 E2B

The unsloth/gemma-4-E2B model is a 5.1 billion effective parameter variant from the Gemma 4 family, developed by Google DeepMind. This multimodal model is designed for efficient local execution on devices like laptops and mobile phones, offering a 128K token context window. It supports text, image, and audio inputs, generating text outputs, and is part of a larger family that includes both Dense and Mixture-of-Experts (MoE) architectures.

Key Capabilities

  • Multimodality: Processes text, images (with variable aspect ratio and resolution), and audio natively. It supports interleaved multimodal input, allowing text and images to be mixed in prompts.
  • Reasoning: Features a built-in reasoning mode that enables step-by-step thinking before generating an answer.
  • Extended Context: Offers a 128K token context window, optimized for memory efficiency through a hybrid attention mechanism and Proportional RoPE (p-RoPE).
  • Enhanced Coding & Agentic Capabilities: Shows improvements in coding benchmarks and includes native function-calling support for agentic workflows.
  • Multilingual Support: Pre-trained on over 140 languages, with out-of-the-box support for 35+ languages.
  • On-Device Optimization: Specifically designed for efficient deployment on mobile and edge devices, utilizing Per-Layer Embeddings (PLE) for parameter efficiency.

Good for

  • On-device AI applications: Its optimized architecture makes it suitable for deployment on mobile phones and laptops.
  • Multimodal tasks: Ideal for applications requiring understanding and generation based on text, images, and audio inputs.
  • Reasoning and agentic workflows: Benefits from its built-in reasoning mode and native function-calling capabilities.
  • Coding tasks: Improved performance in code generation, completion, and correction.
  • Long-context processing: Capable of handling prompts up to 128K tokens, useful for summarizing or analyzing extensive documents.