Jackrong/Qwopus3.5-9B-v3.5
Jackrong/Qwopus3.5-9B-v3.5 is a 9 billion parameter, reasoning-enhanced language model based on Qwen3.5-9B, developed by Jackrong. This v3.5 iteration significantly expands its training data to improve generalization across domains like mathematics, programming, and multi-turn interactions. It is specifically designed for structured reasoning, tool-augmented workflows, and multi-step agentic tasks, offering token-efficient inference with a 32K context length.
Loading preview...
Qwopus3.5-9B-v3.5 Overview
Qwopus3.5-9B-v3.5 is a 9 billion parameter model, a data-scaled continuation of the Qwopus3.5-9B-v3, built upon the Qwen3.5-9B architecture. Its primary focus is on enhancing structured reasoning capabilities through an expanded and high-quality Supervised Fine-Tuning (SFT) dataset, approximately double the size of its predecessor. This version does not introduce new architecture or RL stages but leverages data scaling to improve generalization.
Key Capabilities & Design Principles
- Reasoning Enhancement: Designed for structured reasoning, puzzle-solving, and STEM-related tasks, aiming to better utilize and activate latent knowledge.
- Agentic Workflows: Optimized for tool-augmented workflows and multi-step agentic tasks, including code inspection and bug diagnosis.
- Broad Domain Coverage: Training data covers mathematics, programming, multilingual dialogue, and instruction-following.
- Efficiency: Engineered for token-efficient inference.
Generalization & Performance Insights
Motivated by the hypothesis that scaling high-quality SFT data enhances generalization, this model aims to learn reasoning procedures rather than just output formats. While a dedicated public benchmark report for the 9B model is not yet available, methodology references from the 27B line suggest improvements in multi-step reasoning tasks like MATH500, MMLU-Pro, HumanEval, and GSM8K. Preliminary evaluations on a subset of MMLU-Pro for the 27B line showed a +1.07 percentage point gain with v3.5, and SWE-style agentic coding tests demonstrated improved performance in multi-step agentic coding tasks.
Limitations
Potential limitations include possible overfitting if data scaling exceeds optimal regimes, instability in edge-case reasoning, and dependency of tool-calling performance on environment integration. Not all capabilities are fully benchmarked.