99hgz/WebSeer-14b
WebSeer-14b by 99hgz is a 14.8 billion parameter model based on the Qwen2.5 architecture, designed as a reinforcement learning framework for training intelligent web-based search agents. It integrates self-reflection into its reasoning process, enabling agents to backtrack, reformulate queries, and iteratively improve answers. This model excels at deeper reasoning and longer tool-use chains in real-world web environments, differentiating it from traditional RAG systems.
Loading preview...
WebSeer-14b: A Self-Reflective Web Search Agent
WebSeer-14b is a 14.8 billion parameter model developed by 99hgz, built upon the Qwen2.5 base architecture. It introduces a novel reinforcement learning framework specifically designed for creating intelligent web-based search agents. Unlike conventional Retrieval-Augmented Generation (RAG) systems, WebSeer-14b emphasizes self-reflection throughout its reasoning process.
Key Capabilities
- Deeper Reasoning: The model is engineered to perform more profound analysis and understanding of web content.
- Longer Tool-Use Chains: It supports extended sequences of tool interactions, crucial for complex web navigation and information retrieval.
- Self-Reflective Correction: Agents can dynamically backtrack, reformulate search queries, and iteratively refine their responses based on self-assessment in real-world web environments.
Training and Implementation
WebSeer-14b's training involves a two-stage process: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). The SFT stage utilizes a custom dataset and follows the verl/recipe/retool methodology, adapting it for the Qwen2.5-14b base model. The RL stage further refines the agent's behavior in interactive web scenarios. The model leverages external tools like Serper for Google search result retrieval during inference.
Good For
- Developing advanced web search agents requiring iterative refinement.
- Applications demanding complex tool-use and multi-step reasoning on the web.
- Research into self-correcting AI systems for information retrieval.