Manthan-1.5B by Shahansha is a 1.5 billion parameter language model, built on Qwen2.5-1.5B-Instruct with a 32K context length, specifically fine-tuned for tool interaction and agent workflows. It excels at emitting structured tool calls before generating final answers, making it ideal for agentic math, reasoning tasks, and tool-augmented code workflows. This model prioritizes reliable action traces over verbose verbal reasoning, enabling small models to execute tasks effectively.
Loading preview...
Genesis Manthan-1.5B: Tool-First Reasoning
Shahansha/Manthan-1.5B is a 1.5 billion parameter language model, derived from Qwen/Qwen2.5-1.5B-Instruct, uniquely fine-tuned for tool-first responses and agent workflows. Unlike most small models that generate verbose text, Manthan is designed to reliably call external tools, observe results, and then formulate answers. This approach focuses on actionable traces rather than hidden verbal reasoning.
Key Capabilities
- Structured Tool Calls: Emits structured
<tool_call>blocks before final answers, facilitating agentic execution. - Agentic Reasoning: Optimized for scenarios where external tools are available for problem-solving.
- Efficient Execution: Designed for smolagents-style execution loops, enabling small models to perform complex tasks.
- Targeted Training: Fine-tuned using QLoRA SFT and GRPO with rewards for tool execution and format compliance, leveraging datasets like
Shahansha/manthan-tool-reasoning-v1.
Performance Snapshot
Early benchmarks indicate strong performance in tool-augmented tasks:
- GSM8K (Tool-augmented accuracy): 65.0
- MBPP (pass@1): 50.0
Good For
- Agentic Math & Reasoning: Tasks requiring external execution for problem-solving.
- Tool-Augmented Code: Workflows involving code generation, debugging, and external tool interaction.
- Research Experiments: Exploring small-model tool use and action-first reasoning.
- Interactive Demos: Showcasing agent capabilities in Gradio Spaces and Hugging Face Spaces.
Limitations
As a research model, Manthan-1.5B is not a general factual authority. Its performance heavily relies on proper prompting and tool scaffolding. Users should validate tool-call outputs and restrict available tools in production environments for safety.