codefuse-ai/OpAgent-32B

VISIONConcurrency Cost:2Model Size:33.4BQuant:FP8Ctx Length:32kPublished:Feb 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OpAgent-32B by codefuse-ai is a 33.4 billion parameter Vision-Language Model (VLM) fine-tuned for autonomous web navigation. Built upon the Qwen3-VL-32B-Thinking base, it processes natural language task descriptions and webpage screenshots to output JSON-formatted actions for web task execution. This model excels as a single-model engine for web agent applications, enabling automated interaction with web interfaces.

Loading preview...

OpAgent-32B: A Vision-Language Model for Autonomous Web Navigation

OpAgent-32B, developed by codefuse-ai, is a 33.4 billion parameter Vision-Language Model (VLM) specifically engineered for autonomous web navigation and task execution. It is the core single-model engine within the broader OpAgent project.

Key Capabilities

  • Autonomous Web Navigation: Designed to interpret and interact with web pages to complete user-defined tasks.
  • Vision-Language Integration: Processes both natural language task descriptions and webpage screenshots as input.
  • Action Generation: Outputs structured JSON-formatted actions (e.g., click, type, scroll) or final answers, enabling direct interaction with web elements.
  • Advanced Fine-tuning: Utilizes a Hierarchical Multi-Task SFT strategy followed by Online Agentic Reinforcement Learning with a Hybrid Reward mechanism, built on the Qwen3-VL-32B-Thinking base model.

Recommended Use Cases

OpAgent-32B is primarily intended for use as a web agent. It is optimized for deployment with high-performance inference engines like vLLM, as detailed in its single-model usage guide. Developers can integrate this model to automate complex web-based workflows, perform data extraction, or create intelligent web assistants.