Jackrong/Qwopus3.5-9B-v3
Jackrong/Qwopus3.5-9B-v3 is a 9 billion parameter reasoning-enhanced model based on Qwen3.5-9B, optimized for structural reasoning and tool-calling reinforcement with a 32K context length. It is designed for complex, multi-step agentic workflows, shifting from "reason-then-act" to an "act-then-refine" paradigm. This model excels in programming tasks, achieving 87.80% on HumanEval, and improves reasoning efficiency by 31.7% while reducing cost per correct answer by 24.0%.
Loading preview...
Qwopus3.5-9B-v3: Reasoning-Enhanced LLM for Agentic Workflows
Jackrong/Qwopus3.5-9B-v3 is a 9 billion parameter model built upon Qwen3.5-9B, specifically engineered to enhance reasoning stability, correctness, and inference efficiency, particularly for programming tasks. It introduces an "act-then-refine" paradigm, prioritizing execution-driven optimization over deep pre-execution reasoning for multi-step agent systems.
Key Capabilities & Differentiators
- Structural Reasoning Optimization: Moves beyond traditional CoT distillation by focusing on verifiable, explicit reasoning chains, improving faithfulness and generalization. This results in more stable and accurate reasoning paths.
- Tool-Calling Reinforcement: Incorporates specialized Reinforcement Learning (RL) training to strengthen stability and proficiency in tool invocation within agent frameworks like OpenClaw.
- Improved Reasoning Efficiency: Achieves a 25.3% shorter average reasoning length and 31.7% higher efficiency, leading to a 24.0% lower cost per correct answer compared to Qwen3.5-9B.
- Strong Programming Performance: Attains a base pass@1 of 87.80% on the HumanEval benchmark, outperforming Qwen3.5-9B (82.93%) and Claude-Distilled-v2 (82.32%).
- MMLU-Pro Improvement: Shows a modest but significant +1.43 pp accuracy lead over Qwen3.5-9B on the MMLU-Pro benchmark, achieving 81.79%.
Good For
- Offline Analytical Tasks: Excels in scenarios requiring transparent, step-by-step logical processing.
- Coding and Mathematical Reasoning: Demonstrated strong performance on HumanEval and MMLU-Pro, making it suitable for programming and logic-heavy applications.
- Agentic Workflows: Designed to support complex, multi-step agent systems that benefit from iterative refinement and tool use.
This model was fine-tuned using Unsloth and a high-fidelity reasoning dataset, with a focus on process-level reasoning learning rather than mere answer imitation.