beita6969/FlowSteer-8b

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Jan 31, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

beita6969/FlowSteer-8b is an end-to-end reinforcement learning (RL) framework built upon the Qwen/Qwen3-8B base model, designed for automated agentic workflow orchestration. It utilizes a lightweight policy model and an executable canvas environment to iteratively build and refine workflows through multi-turn interaction. FlowSteer is optimized for addressing challenges like high manual cost and reliance on specific operators/LLMs in workflow orchestration, offering a plug-and-play solution for diverse operator libraries and interchangeable LLM backends.

Loading preview...

FlowSteer: End-to-End RL for Workflow Orchestration

FlowSteer is an innovative framework developed by beita6969 that tackles the complexities of agentic workflow orchestration. It leverages end-to-end reinforcement learning (RL), employing a lightweight policy model to interact with an executable canvas environment. This allows for the automated, iterative construction and refinement of workflows through multi-turn interaction, where the policy model analyzes execution states and selects editing actions, receiving feedback from the canvas.

Key Capabilities

  • End-to-End RL Training: Learns workflow orchestration directly from execution feedback, reducing manual effort.
  • Plug-and-Play Design: Supports integration with diverse operator libraries and allows for interchangeable LLM backends, enhancing flexibility.
  • CWRPO Algorithm: Incorporates Canvas Workflow Relative Policy Optimization (CWRPO) with diversity-constrained rewards and conditional release for robust training.
  • Iterative Refinement: Utilizes multi-turn interaction to build and refine workflows dynamically.

Model Details

FlowSteer is built on the Qwen/Qwen3-8B base model and was trained using the CWRPO method over 300 steps, with a LoRA rank of 64. This framework is particularly suited for research into automated agentic systems and dynamic workflow generation, offering a solution to the challenges of sparse reward signals and operator dependency in complex AI tasks.