Qwen 2.5 3B – Calculator Agent

This model is a specialized fine-tuned version of Qwen 2.5 3B Instruct, developed by Dan Austin, engineered to proficiently interact with a calculator tool. It leverages multi-turn reinforcement learning with GRPO, enabling it to parse complex arithmetic problems and generate structured tool calls in both XML and YAML formats for execution within a recursive calculator environment. After the calculation, the model formulates a clear, human-readable answer.

Key Capabilities

Advanced Tool Use: Generates precise XML/YAML tool calls for arithmetic operations.
High Accuracy: Achieved an 89% accuracy on a challenging synthetic evaluation dataset, a significant +62 point gain from its pre-RL baseline of 27%.
Efficient Training: Fine-tuned using GRPO with a hybrid reward signal (LLM-as-a-judge and programmatic verification) in approximately 3 hours on 4x A100 GPUs.
Structured Output: Capable of producing both tool-use instructions and final human-readable answers.

Good for

Automated Mathematical Reasoning: Ideal for applications requiring accurate and verifiable arithmetic problem-solving.
Agentic Workflows: Suitable for integration into agent systems that need to perform calculations via external tools.
Complex Data Processing: Can handle nested operations and diverse phrasing in mathematical queries.