Jackrong/Qwopus3.6-35B-A3B-v1
Jackrong/Qwopus3.6-35B-A3B-v1 is a 35.1 billion parameter reasoning-enhanced Mixture-of-Experts (MoE) model, fine-tuned by Jackrong on the Qwen3.6-35B-A3B base model. It features 3B active parameters per token for inference efficiency and supports a 262k context window. This model is optimized for deep reasoning, agentic coding, and multimodal tasks, demonstrating strong performance in overall quality and reliability.
Loading preview...
Qwopus3.6-35B-A3B-v1: Reasoning-Enhanced MoE Model
Qwopus3.6-35B-A3B-v1 is a 35.1 billion parameter Mixture-of-Experts (MoE) model, fine-tuned by Jackrong from the Qwen3.6-35B-A3B base model. It leverages a hybrid sparse MoE architecture with 3 billion active parameters per token, ensuring high inference efficiency while supporting a native 262k context window. The model is specifically designed for advanced reasoning, agentic coding, and multimodal applications.
Key Capabilities & Features
- Enhanced Reasoning: Fine-tuned through a three-stage distributed SFT process to improve structured reasoning and consistent answer styles.
- High Inference Efficiency: Achieves an average of 161.9 tok/s on an RTX 5090, a 2.6x speedup over dense predecessors, making it suitable for single-GPU consumer hardware.
- Multimodal Support: Includes vision capabilities and tool calling. Users need to place the
mmproj.gguffile alongside the main model file to enable vision. - Robust Long-Context Performance: Addresses "thinking starvation" issues, maintaining performance in long-context JSON extraction and multi-step agentic planning.
- Production-Grade UI/UX Generation: Excels at one-shot HTML/CSS generation, producing complete, functional pages with complex interactions.
- LoRA Fine-tuning: Utilizes LoRA with approximately 9% of model parameters updated, allowing for deep adaptation of reasoning capabilities.
Use Cases
This model is a premier choice for developers requiring a high-throughput, agentic model that excels at UI/UX generation and complex logical deduction on a single-GPU setup. It is particularly suited for tasks demanding structured reasoning, consistent output, and efficient processing of long contexts.