Model Overview

This model is a specialized variant of Google DeepMind's Gemma 3 1B-IT, a 1 billion parameter instruction-tuned multimodal language model. While retaining the core Gemma 3 capabilities for text and image input, its key differentiator is the integration of a Qwen3-style tool-calling template. This allows the model to generate structured tool calls and process tool responses, making it highly suitable for applications requiring external function execution.

Key Capabilities

Tool-Calling: Emits Qwen3-style <tool_call> and processes <tool_response> tags, supporting single or multiple parallel tool calls within an assistant turn.
Multimodal Input: Handles both text and image inputs, with images normalized to 896x896 resolution.
Instruction-Tuned: Optimized for following instructions and generating coherent, relevant text.
Context Window: Features a 32K token context window for the 1B size, enabling processing of longer inputs.
Multilingual Support: The base Gemma 3 models support over 140 languages.

Good for

Tool-Augmented AI: Ideal for building agents that can interact with external APIs and tools.
Conversational AI: Enhances chatbots and virtual assistants with the ability to perform actions via tool calls.
Text Generation: Suitable for various text generation tasks, including creative writing, summarization, and question answering.
Image Understanding: Can analyze image content and generate textual responses or integrate visual information into tool calls.