LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 9, 2026Architecture:Transformer0.0K Cold

LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B is a Qwen2.5-7B-Instruct based model developed by LMIS-ORG that implements the AgentFlow paradigm, extending single-step LLM inference into a multi-turn Planner → Executor → Verifier agent loop. This model applies RL signals (GRPO) to the Planner's generation trajectory to enhance tool-use and reasoning capabilities without requiring manual intermediate step annotations. It is specifically designed for complex problem-solving, demonstrating significant improvements in tasks like AIME 2024.

Loading preview...

LMIS-ORG/AgentFlow_Slime_Agentic_Qwen2.5_7B Overview

This model, developed by LMIS-ORG, is based on the Qwen2.5-7B-Instruct architecture and implements the novel AgentFlow paradigm. AgentFlow transforms traditional single-step LLM inference into a sophisticated multi-turn agentic process, featuring a Planner → Executor → Verifier loop. A key innovation is the application of Reinforcement Learning (RL) signals, specifically GRPO, to the Planner's generation trajectory. This allows the model to autonomously improve its tool-use and reasoning abilities, bypassing the need for labor-intensive manual annotation of intermediate steps.

Key Capabilities

  • Agentic Reasoning: Employs a structured Planner-Executor-Verifier loop for complex problem-solving.
  • Reinforcement Learning: Utilizes GRPO to refine the Planner's strategy and enhance performance.
  • Tool Use: Integrates specialized tools like base_generator for general text generation and python_coder for mathematical computation and algorithmic tasks.
  • Improved Performance: Demonstrates substantial gains over baseline models, achieving a +20.0% improvement on the AIME 2024 dataset (from 10.0% to 30.0%) with the Qwen2.5-7B-Instruct base.

Good For

  • Complex Problem Solving: Excels in scenarios requiring multi-step reasoning and tool invocation.
  • Automated Agent Development: Ideal for researchers and developers exploring advanced agentic LLM architectures.
  • Mathematical and Algorithmic Tasks: Leverages the python_coder tool for accurate computation and problem-solving.

Note: The current model was trained for 100 steps due to resource constraints, indicating potential for further improvement with extended training.