dcostenco/prism-coder-9b

VISIONConcurrency Cost:1Model Size:9BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Jun 15, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The dcostenco/prism-coder-9b is a 9 billion parameter QLoRA fine-tuned Qwen3.5-9B model, developed by dcostenco, specifically optimized for tool routing within the Prism Coder system. It features a 32768-token context length and utilizes a hybrid attention architecture combining linear and full attention layers for efficient pattern matching and precise reasoning. This model achieves 100% accuracy on the BFCL benchmark across three seeds, demonstrating high accuracy and zero hallucinations for tool-use tasks, and offers a 36% smaller footprint than its 14B predecessor.

Loading preview...

dcostenco/prism-coder-9b: Optimized Tool Router

The dcostenco/prism-coder-9b is a 9 billion parameter model, QLoRA fine-tuned from Qwen3.5-9B, designed as the default tool router for the Prism Coder system. This model significantly improves upon its 14B predecessor by offering a 36% smaller footprint (5.8 GB vs 9.0 GB) while achieving superior accuracy.

Key Capabilities & Performance

  • Exceptional Tool Routing Accuracy: Achieves 100% accuracy on the BFCL benchmark across three shuffled runs, with zero hallucinations, outperforming the previous 14B Qwen 3 model's 90.3%.
  • Efficient Architecture: Leverages Qwen3.5-9B's hybrid attention, combining linear attention (Gated DeltaNet) for O(n) inference and pattern matching with standard full attention for precise retrieval and reasoning.
  • Optimized for MCP Tool Routing: Specifically trained on a corpus including tool-use data (36%) to excel in Multi-Component Programming (MCP) tool routing, remapping, parameter normalization, and multi-turn chain parsing.
  • Resource Efficient: Provides high performance in a compact 5.8 GB size, making it suitable for various deployment scenarios.

Training Details

The model was trained using QLoRA (4-bit base + bf16 adapters) on Apple Silicon, utilizing mlx-community/Qwen3.5-9B-MLX-4bit as the base. The training involved 2000 iterations on a 26K-row corpus, focusing on tool-use, abstention, and safety data.

Ideal Use Cases

This model is ideal for applications requiring highly accurate and efficient tool routing, particularly within code generation and complex task execution environments where precise function calling and parameter handling are critical. It serves as the recommended default router within the Prism Coder fleet due to its balance of size and performance.