roshangrewal/gemma4-e4b-toolcall-v02
The roshangrewal/gemma4-e4b-toolcall-v02 is a 7.9 billion parameter model developed by Roshan Grewal, built upon Google's Gemma 4 E4B-it architecture. Fine-tuned on 78K curated examples, this model excels at reliable function calling, tool selection, and determining when not to call a tool. It achieves 95% multi-tool accuracy and retains general conversational, reasoning, and knowledge capabilities, making it suitable for production-grade tool-calling applications.
Loading preview...
Gemma 4 E4B — Tool-Calling v0.2: Production-Grade Function Calling
This model, developed by Roshan Grewal, is a 7.9 billion parameter variant of Google's Gemma 4 E4B-it, specifically fine-tuned for robust tool-calling capabilities. It was trained on 78,298 curated examples, standardizing data to OpenAI format, with 80% being multi-turn conversations.
Key Capabilities & Performance
- High Tool-Calling Accuracy: Achieves 95.0% accuracy on multi-tool selection and approximately 90% on full match (name + arguments) for tool calls.
- Reliable No-Call Detection: Demonstrates 87.5% accuracy in correctly identifying when no tool call is needed, significantly improved from v0.1.
- Efficient Deployment: Designed to run on a single GPU, with available formats including full fp16, LoRA adapter, and GGUF (Q8_0) for Ollama, llama.cpp, and LM Studio.
- Retains Base Model Strengths: Inherits the general conversational, reasoning, and knowledge capabilities of the Gemma 4 E4B-it base model.
- Context Length: Supports a maximum sequence length of 32,768 tokens at inference, though best accuracy is observed on sequences up to 4K tokens (training length).
Use Cases & Limitations
- Good for:
- Integrating LLMs with external APIs and services through function calls.
- Automating tasks requiring precise tool selection and argument parsing.
- Applications needing a compact yet powerful tool-calling model.
- Limitations:
- Outputs in Gemma 4 native format (vLLM auto-converts to JSON).
- May occasionally over-trigger tool calls due to 87.5% no-call accuracy.
- Not extensively tested on non-English queries.
- As a 4B model, it may be less capable than larger 70B+ models for highly complex, multi-step reasoning tasks.