LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B
LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B is a Qwen2.5-7B-Instruct based model developed by LMIS-ORG that implements the AgentFlow paradigm, extending single-step LLM inference into a multi-turn Planner → Executor → Verifier agent loop. This model applies RL signals (GRPO) to the Planner's generation trajectory to enhance tool-use and reasoning capabilities without requiring manual intermediate step annotations. It is specifically designed for complex problem-solving, demonstrating significant improvements in tasks like AIME 2024.
Loading preview...
LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B Overview
This model, developed by LMIS-ORG, is based on the Qwen2.5-7B-Instruct architecture and implements the novel AgentFlow paradigm. AgentFlow transforms traditional single-step LLM inference into a sophisticated multi-turn agentic process, featuring a Planner → Executor → Verifier loop. A key innovation is the application of Reinforcement Learning (RL) signals, specifically GRPO, to the Planner's generation trajectory. This allows the model to autonomously improve its tool-use and reasoning abilities, bypassing the need for labor-intensive manual annotation of intermediate steps.
Key Capabilities
- Agentic Reasoning: Employs a structured Planner-Executor-Verifier loop for complex problem-solving.
- Reinforcement Learning: Utilizes GRPO to refine the Planner's strategy and enhance performance.
- Tool Use: Integrates specialized tools like
base_generatorfor general text generation andpython_coderfor mathematical computation and algorithmic tasks. - Improved Performance: Demonstrates substantial gains over baseline models, achieving a +20.0% improvement on the AIME 2024 dataset (from 10.0% to 30.0%) with the Qwen2.5-7B-Instruct base.
Good For
- Complex Problem Solving: Excels in scenarios requiring multi-step reasoning and tool invocation.
- Automated Agent Development: Ideal for researchers and developers exploring advanced agentic LLM architectures.
- Mathematical and Algorithmic Tasks: Leverages the
python_codertool for accurate computation and problem-solving.
Note: The current model was trained for 100 steps due to resource constraints, indicating potential for further improvement with extended training.