armand0e/Qwen3.5-9B-Opus-Agent
The armand0e/Qwen3.5-9B-Opus-Agent is a 9 billion parameter Qwen3.5-based language model, fine-tuned by armand0e on Opus traces and a small dataset. It excels in instruction following, code debugging, and tool-calling stability, achieving a perfect 100 on ToolCall-15 benchmarks. This model is optimized for agentic workflows, demonstrating strong performance in complex agent behaviors while maintaining its reasoning capabilities.
Loading preview...
armand0e/Qwen3.5-9B-Opus-Agent: An Agent-Optimized Qwen3.5 Finetune
This model is a 9 billion parameter finetune of the Qwen3.5 base model, developed by armand0e. It was trained for 4 hours using Unsloth and Huggingface's TRL library, leveraging Opus traces and a specialized dataset to enhance agentic capabilities while preserving core reasoning.
Key Capabilities and Performance
The model demonstrates significant improvements in agent-specific benchmarks:
- Instruction Following (InstructFollow-15): Achieves a comprehensive score of 97, outperforming Jackrong/Qwopus3.5-9B-coder (93) by excelling in formatting, count, numbering, sentence, and length constraints.
- Code Debugging & Bug Fixing (BugFind-15): Scores 84, surpassing Jackrong/Qwopus3.5-9B-coder (79) and the base Qwen3.5-9B-Agent (58) in debugging syntax, logic errors, and trap code.
- Tool Call Stability (ToolCall-15): Achieves a perfect 100, matching other top models like Jackrong/Qwopus3.5-9B-coder and the base Qwen/Qwen3.5-9B in direct tool-calling precision.
- Complex Agent Performance (HermesAgent-20): Scores 80, showing strong performance in memory, orchestration, skill use, scheduling, and delegation, though slightly behind Jackrong/Qwopus3.5-9B-coder (85).
Good For
- Agentic Applications: Ideal for use cases requiring robust instruction following, stable tool calling, and complex agent behaviors.
- Code-Related Tasks: Strong performance in debugging and bug fixing makes it suitable for developer tools and code assistance.
- Enhanced Control: Excels at adhering to specific formatting and length constraints in responses.