Jackrong/Qwopus3.5-4B-v3
Jackrong/Qwopus3.5-4B-v3 is a 4.5 billion parameter reasoning-enhanced model based on Qwen3.5-4B, developed by Jackrong. It features structural reasoning optimization and tool-calling reinforcement, designed for improved stability and correctness in programming and multi-step agentic workflows. The model emphasizes an "act-then-refine" paradigm, achieving 75.61% on the HumanEval benchmark.
Loading preview...
Qwopus3.5-4B-v3: Reasoning-Enhanced Qwen3.5-4B
Developed by Jackrong, Qwopus3.5-4B-v3 is a 4.5 billion parameter model built upon Qwen3.5-4B, optimized for reasoning stability, correctness, and inference efficiency, particularly in programming tasks. It introduces a significant shift from traditional "reason-then-act" to an "act-then-refine" paradigm, where agents execute actions early and refine behavior based on environmental feedback.
Key Capabilities
- Structural Reasoning Optimization: Refines the reasoning process through high-quality reasoning distillation and structural alignment, moving beyond simple answer imitation to process-level reasoning. This results in more explicit and verifiable intermediate steps, though CoT length will be longer.
- Tool-Calling Reinforcement: Incorporates specialized Reinforcement Learning (RL) training for tool-calling, enhancing stability in continuous task execution and proficiency within tool-augmented agent frameworks like OpenClaw.
- Execution-Driven Refinement: Designed for complex, multi-step agentic workflows, enabling robust task completion through iterative interaction and correction.
Performance & Training
Evaluated on the HumanEval benchmark, Qwopus3.5-4B-v3 achieved a strict overall score of 75.61% (124/164), outperforming Qwen3.5-4B (72.56%) and Claude-Distilled-v2 (69.51%) at the 4B scale. The model was fine-tuned using Unsloth and LoRA on a high-fidelity, curated reasoning dataset, focusing on faithful, complete, and structurally clear reasoning traces.
Intended Use
Best suited for offline analytical tasks, coding, mathematical problems, and logic-dependent prompting where transparent internal logic is crucial. It is a test version for academic research and technical exploration, with known limitations such as potential CoT instability or logic loops due to its experimental nature.