RefinedNeuro/RefinedToolCallV5-3b

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 27, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

RefinedNeuro/RefinedToolCallV5-3b is a 3.1 billion parameter model built on WeiboAI/VibeThinker-3B, specifically optimized for multi-turn agentic tool calling and mathematical reasoning. It demonstrates significantly improved stateful, multi-step tool use and maintains strong reasoning capabilities, achieving 0.933 on AIME-2024 pass@8. This 32K context length model is designed for local, offline agentic tool-use prototypes and multi-step function-calling assistants.

Loading preview...

RefinedToolCall-V5-3B Overview

RefinedNeuro/RefinedToolCallV5-3b is a 3.1 billion parameter model, developed by RefinedNeuro, that excels in multi-turn agentic tool calling and mathematical reasoning. Unlike many smaller models that struggle with complex, multi-step interactions, this model is specifically engineered to maintain coherence and effectiveness across several turns, showing a ~3.7x improvement in multi-turn stateful tool-use on the Berkeley Function-Calling Leaderboard (multi_turn).

Key Capabilities

  • Enhanced Multi-turn Agentic Behavior: Achieves 0.220 average / 0.298 pass@3 on BFCL multi_turn, demonstrating robust multi-step tool-use.
  • Sharp Single-turn Function Calling: Scores 0.707 on BFCL single-turn (held-out).
  • Tool Error Recovery: Boasts a 0.896 recovery rate, allowing it to diagnose and recover from tool failures.
  • Intact Reasoning: Maintains strong mathematical reasoning with 0.933 on AIME-2024 pass@8, indicating no degradation from tool training.
  • Compact & Local: A 3B parameter model (2.5 GB Q6_K) designed to run efficiently on laptops via Ollama, without requiring a dedicated GPU.
  • Apache-2.0 Licensed: Freely available for use, shipping, and fine-tuning.

How it Achieved This

The model's capabilities were developed through five disciplined fine-tuning rounds, including a breakthrough in on-policy self-improvement where the model learned from its own successful multi-turn solutions. This process ensured that reasoning and recovery capabilities were never regressed.

Good For

  • Local/offline agentic tool-use prototypes
  • Multi-step function-calling assistants
  • Math & STEM reasoning tasks
  • Learning about the construction of small agentic models