viktoroo/gemma-3-4b-tools: A Tool-Augmented Gemma 3 Repack
This model is a repackaging of Google DeepMind's Gemma 3 4B instruction-tuned (google/gemma-3-4b-it), a 4.3 billion parameter multimodal model with a 32K token context window. The core model weights and capabilities remain identical to the original Gemma 3 4B-IT, which excels in text generation and image understanding tasks, supporting over 140 languages.
Key Differentiator: Tool-Calling Chat Template
The primary modification in this version is an updated chat_template within tokenizer_config.json. This template is designed to:
- Automatically inject a system prompt if none is provided.
- Document available external tools.
- Define a function-calling protocol using
<tool_call> and <tool_response> blocks.
Important Note: The model's weights have not been modified or fine-tuned to align with this new tool-calling protocol. Therefore, while the template provides the structure, the model may produce malformed tool calls or incorrect tool usage without further fine-tuning. Its behavior, capabilities, and safety limitations are directly inherited from the original gemma-3-4b-it checkpoint.
Core Capabilities (inherited from Gemma 3 4B-IT):
- Multimodal: Handles both text and image inputs (images normalized to 896x896 resolution, encoded to 256 tokens each) and generates text outputs.
- Extensive Context: Supports a total input context of 32K tokens.
- Multilingual Support: Trained on data including content in over 140 languages.
- Diverse Task Performance: Well-suited for question answering, summarization, reasoning, and image analysis.
- Efficient Deployment: Its relatively small size (4.3B parameters) allows for deployment in resource-limited environments like laptops or desktops.
Intended Use Cases:
This model is ideal for developers and researchers looking to experiment with integrating tool-calling mechanisms into a powerful, open-source multimodal LLM. It provides a structured template for tool interaction, serving as a foundation for further fine-tuning to achieve robust function-calling capabilities. It also retains all the general text generation and image understanding applications of the base Gemma 3 4B-IT model.