Name: roshangrewal/gemma4-e4b-toolcall-v02 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: roshangrewal

Gemma 4 E4B — Tool-Calling v0.2: Production-Grade Function Calling

This model, developed by Roshan Grewal, is a 7.9 billion parameter variant of Google's Gemma 4 E4B-it, specifically fine-tuned for robust tool-calling capabilities. It was trained on 78,298 curated examples, standardizing data to OpenAI format, with 80% being multi-turn conversations.

Key Capabilities & Performance

High Tool-Calling Accuracy: Achieves 95.0% accuracy on multi-tool selection and approximately 90% on full match (name + arguments) for tool calls.
Reliable No-Call Detection: Demonstrates 87.5% accuracy in correctly identifying when no tool call is needed, significantly improved from v0.1.
Efficient Deployment: Designed to run on a single GPU, with available formats including full fp16, LoRA adapter, and GGUF (Q8_0) for Ollama, llama.cpp, and LM Studio.
Retains Base Model Strengths: Inherits the general conversational, reasoning, and knowledge capabilities of the Gemma 4 E4B-it base model.
Context Length: Supports a maximum sequence length of 32,768 tokens at inference, though best accuracy is observed on sequences up to 4K tokens (training length).

Use Cases & Limitations

Good for:
- Integrating LLMs with external APIs and services through function calls.
- Automating tasks requiring precise tool selection and argument parsing.
- Applications needing a compact yet powerful tool-calling model.
Limitations:
- Outputs in Gemma 4 native format (vLLM auto-converts to JSON).
- May occasionally over-trigger tool calls due to 87.5% no-call accuracy.
- Not extensively tested on non-English queries.
- As a 4B model, it may be less capable than larger 70B+ models for highly complex, multi-step reasoning tasks.

Overview

Gemma 4 E4B — Tool-Calling v0.2: Production-Grade Function Calling

Key Capabilities & Performance

Use Cases & Limitations

Full Model Card (README)