dongguanting/Tool-Star-Qwen-3B
Tool-Star-Qwen-3B is a 3.1 billion parameter language model developed by dongguanting, based on the Qwen2.5-3B-Instruct architecture with a 32768 token context length. This model is specifically fine-tuned using the Tool-Star framework, which empowers it with advanced multi-tool collaborative reasoning capabilities. It excels at autonomously invoking and integrating multiple external tools during stepwise reasoning processes, making it suitable for complex problem-solving tasks requiring external knowledge or computation.
Loading preview...
Overview
Tool-Star-Qwen-3B is a 3.1 billion parameter language model developed by dongguanting, built upon the Qwen2.5-3B-Instruct base model. It is specifically trained using the novel Tool-Star framework, which focuses on enhancing the model's ability to perform multi-tool collaborative reasoning. This framework integrates six types of external tools and employs a two-stage training process, including cold-start fine-tuning and a multi-tool self-critic reinforcement learning (RL) algorithm with hierarchical reward design.
Key Capabilities
- Autonomous Tool Invocation: Designed to autonomously invoke and integrate multiple external tools during complex reasoning tasks.
- Reinforcement Learning for Reasoning: Leverages an RL-based framework to empower effective multi-tool collaborative reasoning.
- Data Synthesis Pipeline: Utilizes a general tool-integrated reasoning data synthesis pipeline, combining tool-integrated prompting with hint-based sampling to generate tool-use trajectories.
- Enhanced Collaboration: Employs a two-stage training framework to improve multi-tool collaboration, guided by tool-invocation feedback and reinforced by a multi-tool self-critic RL algorithm.
Good For
- Complex Problem Solving: Ideal for applications requiring the model to break down problems and utilize external tools for intermediate steps.
- Research in Tool-Augmented LLMs: A valuable resource for researchers exploring reinforcement learning and multi-tool integration in large language models.
- Reasoning Benchmarks: Demonstrated effectiveness across over 10 challenging reasoning benchmarks, indicating strong performance in tasks requiring logical deduction and external resource utilization.