Overview
Model Overview
This model is a specialized variant of Google DeepMind's Gemma 3 1B-IT, a 1 billion parameter instruction-tuned multimodal language model. While retaining the core Gemma 3 capabilities for text and image input, its key differentiator is the integration of a Qwen3-style tool-calling template. This allows the model to generate structured tool calls and process tool responses, making it highly suitable for applications requiring external function execution.
Key Capabilities
- Tool-Calling: Emits Qwen3-style
<tool_call>and processes<tool_response>tags, supporting single or multiple parallel tool calls within an assistant turn. - Multimodal Input: Handles both text and image inputs, with images normalized to 896x896 resolution.
- Instruction-Tuned: Optimized for following instructions and generating coherent, relevant text.
- Context Window: Features a 32K token context window for the 1B size, enabling processing of longer inputs.
- Multilingual Support: The base Gemma 3 models support over 140 languages.
Good for
- Tool-Augmented AI: Ideal for building agents that can interact with external APIs and tools.
- Conversational AI: Enhances chatbots and virtual assistants with the ability to perform actions via tool calls.
- Text Generation: Suitable for various text generation tasks, including creative writing, summarization, and question answering.
- Image Understanding: Can analyze image content and generate textual responses or integrate visual information into tool calls.