Model Overview
YiXin-Agentic-Qwen3-14B, developed by YiXin-AILab, is an advanced agentic model built on the Qwen3 architecture, specifically engineered for complex multi-turn interactions. It demonstrates strong reasoning and agent capabilities, achieved through a multi-stage distillation and reinforcement training process.
Key Differentiators & Training Innovations
- Multi-Stage Training: Utilizes process-supervised knowledge distillation for enhanced reasoning and generalization, followed by multi-stage reinforcement learning to improve agent decision quality and stability.
- Efficient GRPO++ Algorithm: Incorporates an advanced GRPO++ algorithm to simplify training, reduce hyperparameter tuning, and boost optimization efficiency through reward-focused updates.
- Stable Training: Features improved training stability by removing entropy loss and adding monitoring metrics like reasoning repetition rate and Channel Reward.
- Diverse Exploration: Employs FIRE Sampling Rollout with Clip High to broaden training data and enable diverse solutions for complex and rare scenarios.
Performance Highlights
This model has been rigorously benchmarked against leading models such as DeepSeek-V3.1, Kimi-K2-Instruct-0905, and Qwen3-235B-A22B-Thinking-2507. YiXin-Agentic-Qwen3-14B shows significant improvements in both agentic and reasoning capabilities, including tool usage, logical reasoning, mathematics, and coding. It achieves an average agentic score of 57.2, surpassing Qwen3-14B's 43.1 and closely competing with larger models. While its reasoning average is 76.8, it demonstrates competitive performance in specific areas like MATH-500 and AIME-25.
Ideal Use Cases
This model is particularly well-suited for applications requiring robust agentic behavior, complex multi-turn dialogues, and strong reasoning across various domains, including customer service automation with tool integration, and tasks demanding logical problem-solving.