Overview
Qwen 2.5 3B โ Calculator Agent
This model is a specialized fine-tuned version of Qwen 2.5 3B Instruct, developed by Dan Austin, engineered to proficiently interact with a calculator tool. It leverages multi-turn reinforcement learning with GRPO, enabling it to parse complex arithmetic problems and generate structured tool calls in both XML and YAML formats for execution within a recursive calculator environment. After the calculation, the model formulates a clear, human-readable answer.
Key Capabilities
- Advanced Tool Use: Generates precise XML/YAML tool calls for arithmetic operations.
- High Accuracy: Achieved an 89% accuracy on a challenging synthetic evaluation dataset, a significant +62 point gain from its pre-RL baseline of 27%.
- Efficient Training: Fine-tuned using GRPO with a hybrid reward signal (LLM-as-a-judge and programmatic verification) in approximately 3 hours on 4x A100 GPUs.
- Structured Output: Capable of producing both tool-use instructions and final human-readable answers.
Good for
- Automated Mathematical Reasoning: Ideal for applications requiring accurate and verifiable arithmetic problem-solving.
- Agentic Workflows: Suitable for integration into agent systems that need to perform calculations via external tools.
- Complex Data Processing: Can handle nested operations and diverse phrasing in mathematical queries.