roshangrewal/gemma4-e4b-toolcall-v02

VISIONConcurrency Cost:1Model Size:7.9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 16, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The roshangrewal/gemma4-e4b-toolcall-v02 is a 7.9 billion parameter model developed by Roshan Grewal, built upon Google's Gemma 4 E4B-it architecture. Fine-tuned on 78K curated examples, this model excels at reliable function calling, tool selection, and determining when not to call a tool. It achieves 95% multi-tool accuracy and retains general conversational, reasoning, and knowledge capabilities, making it suitable for production-grade tool-calling applications.

Loading preview...

Gemma 4 E4B — Tool-Calling v0.2: Production-Grade Function Calling

This model, developed by Roshan Grewal, is a 7.9 billion parameter variant of Google's Gemma 4 E4B-it, specifically fine-tuned for robust tool-calling capabilities. It was trained on 78,298 curated examples, standardizing data to OpenAI format, with 80% being multi-turn conversations.

Key Capabilities & Performance

  • High Tool-Calling Accuracy: Achieves 95.0% accuracy on multi-tool selection and approximately 90% on full match (name + arguments) for tool calls.
  • Reliable No-Call Detection: Demonstrates 87.5% accuracy in correctly identifying when no tool call is needed, significantly improved from v0.1.
  • Efficient Deployment: Designed to run on a single GPU, with available formats including full fp16, LoRA adapter, and GGUF (Q8_0) for Ollama, llama.cpp, and LM Studio.
  • Retains Base Model Strengths: Inherits the general conversational, reasoning, and knowledge capabilities of the Gemma 4 E4B-it base model.
  • Context Length: Supports a maximum sequence length of 32,768 tokens at inference, though best accuracy is observed on sequences up to 4K tokens (training length).

Use Cases & Limitations

  • Good for:
    • Integrating LLMs with external APIs and services through function calls.
    • Automating tasks requiring precise tool selection and argument parsing.
    • Applications needing a compact yet powerful tool-calling model.
  • Limitations:
    • Outputs in Gemma 4 native format (vLLM auto-converts to JSON).
    • May occasionally over-trigger tool calls due to 87.5% no-call accuracy.
    • Not extensively tested on non-English queries.
    • As a 4B model, it may be less capable than larger 70B+ models for highly complex, multi-step reasoning tasks.