beita6969/FlowSteer-8b
beita6969/FlowSteer-8b is an end-to-end reinforcement learning (RL) framework built upon the Qwen/Qwen3-8B base model, designed for automated agentic workflow orchestration. It utilizes a lightweight policy model and an executable canvas environment to iteratively build and refine workflows through multi-turn interaction. FlowSteer is optimized for addressing challenges like high manual cost and reliance on specific operators/LLMs in workflow orchestration, offering a plug-and-play solution for diverse operator libraries and interchangeable LLM backends.
Loading preview...
FlowSteer: End-to-End RL for Workflow Orchestration
FlowSteer is an innovative framework developed by beita6969 that tackles the complexities of agentic workflow orchestration. It leverages end-to-end reinforcement learning (RL), employing a lightweight policy model to interact with an executable canvas environment. This allows for the automated, iterative construction and refinement of workflows through multi-turn interaction, where the policy model analyzes execution states and selects editing actions, receiving feedback from the canvas.
Key Capabilities
- End-to-End RL Training: Learns workflow orchestration directly from execution feedback, reducing manual effort.
- Plug-and-Play Design: Supports integration with diverse operator libraries and allows for interchangeable LLM backends, enhancing flexibility.
- CWRPO Algorithm: Incorporates Canvas Workflow Relative Policy Optimization (CWRPO) with diversity-constrained rewards and conditional release for robust training.
- Iterative Refinement: Utilizes multi-turn interaction to build and refine workflows dynamically.
Model Details
FlowSteer is built on the Qwen/Qwen3-8B base model and was trained using the CWRPO method over 300 steps, with a LoRA rank of 64. This framework is particularly suited for research into automated agentic systems and dynamic workflow generation, offering a solution to the challenges of sparse reward signals and operator dependency in complex AI tasks.