Jackrong/Qwopus3.5-27B-v3
Jackrong/Qwopus3.5-27B-v3 is a 27 billion parameter reasoning-enhanced language model based on Qwen3.5-27B, fine-tuned to improve reasoning stability, correctness, and inference efficiency. It utilizes a structural alignment approach for Chain-of-Thought optimization and incorporates specialized reinforcement learning for tool-calling. This model is designed for complex, multi-step agentic workflows, excelling in programming tasks and offline analytical scenarios by emphasizing an "act-then-refine" paradigm.
Loading preview...
Qwopus3.5-27B-v3: Reasoning-Enhanced LLM
Qwopus3.5-27B-v3 is a 27 billion parameter model built upon Qwen3.5-27B, focusing on enhancing reasoning capabilities and inference efficiency. It introduces a novel "act-then-refine" paradigm for multi-step agent systems, shifting from pre-action deliberation to execution-driven refinement based on environmental feedback.
Key Capabilities & Differentiators
- Structural Reasoning Optimization: Moves beyond simple CoT distillation by focusing on faithful, complete, and structurally clear reasoning traces, leading to process-level reasoning learning rather than just answer imitation. This results in higher generalization and robustness on unseen tasks.
- Tool-Calling Reinforcement: Incorporates specialized RL training to improve stability and proficiency in tool invocation within tool-augmented agent frameworks like OpenClaw.
- Performance: Achieves a strong balance between accuracy and efficiency, matching or outperforming Qwen3.5-27B on most tasks while using significantly fewer generated tokens. On HumanEval, Qwopus3.5-27B-v3 scored 95.73% (157/164), surpassing Qwen3.5-27B (94.51%).
- Training Approach: Fine-tuned using Unsloth on a high-fidelity reasoning dataset, with a focus on masking response-only training.
Good For
- Offline Analytical Tasks: Excels in scenarios requiring transparent, step-by-step internal logic.
- Coding & Mathematical Reasoning: Demonstrated strong performance on HumanEval, indicating proficiency in programming tasks.
- Agentic Workflows: Optimized for complex, multi-step agent systems that benefit from iterative interaction and correction.
Limitations
- As an autoregressive LLM, it carries a risk of hallucination, especially for real-world events within thinking sequences.
- The model's reasoning chain (CoT) may occasionally exhibit instability, logic loops, or reasoning drift due to its independent development with limited resources.