Danau5tin/calculator_agent_qwen2.5_3b

Cold
Public
3.1B
BF16
32768
Hugging Face
Overview

Qwen 2.5 3B โ€“ Calculator Agent

This model is a specialized fine-tuned version of Qwen 2.5 3B Instruct, developed by Dan Austin, engineered to proficiently interact with a calculator tool. It leverages multi-turn reinforcement learning with GRPO, enabling it to parse complex arithmetic problems and generate structured tool calls in both XML and YAML formats for execution within a recursive calculator environment. After the calculation, the model formulates a clear, human-readable answer.

Key Capabilities

  • Advanced Tool Use: Generates precise XML/YAML tool calls for arithmetic operations.
  • High Accuracy: Achieved an 89% accuracy on a challenging synthetic evaluation dataset, a significant +62 point gain from its pre-RL baseline of 27%.
  • Efficient Training: Fine-tuned using GRPO with a hybrid reward signal (LLM-as-a-judge and programmatic verification) in approximately 3 hours on 4x A100 GPUs.
  • Structured Output: Capable of producing both tool-use instructions and final human-readable answers.

Good for

  • Automated Mathematical Reasoning: Ideal for applications requiring accurate and verifiable arithmetic problem-solving.
  • Agentic Workflows: Suitable for integration into agent systems that need to perform calculations via external tools.
  • Complex Data Processing: Can handle nested operations and diverse phrasing in mathematical queries.