aipster/DevRouter-1.5B

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

aipster/DevRouter-1.5B is a 1.5 billion parameter model, fine-tuned from Qwen2.5-Coder-1.5B-Instruct, designed to act as a fast router for developer prompts. It processes raw coding prompts and outputs a structured JSON decision, including a rewritten prompt, intent, complexity, suggested model route, and missing context. This model excels at triaging developer requests deterministically and quickly, making it suitable for pre-routing in LLM applications.

Loading preview...

DevRouter-1.5B: A Fast LLM Router for Developer Prompts

DevRouter-1.5B is a compact, 1.5 billion parameter model, fine-tuned from Qwen2.5-Coder-1.5B-Instruct, specifically engineered to triage and route developer prompts. Its core function is to take a raw developer query and transform it into a structured JSON output, enabling efficient routing to more expensive, larger language models.

Key Capabilities

  • Structured JSON Output: Generates a single JSON object containing a rewritten prompt, classified intent (e.g., debug, refactor, feature), complexity (low, medium, high), a suggested route (small_local, medium_api, large_api), and identified missing contextual information.
  • High Performance: Achieves ~280 tokens/s generation and ~1–3 seconds latency per routing call on a single RTX 3090 (Q8_0 GGUF), making it suitable for real-time pre-processing.
  • Deterministic Triage: Designed for stable, parseable JSON output, recommending greedy decoding (temperature=0) for consistent results.
  • Evaluation Metrics: Demonstrates high JSON validity (over 94% for Q8_0 GGUF) and reasonable accuracy for intent, route, and complexity classification, with stronger performance on common intents like debug.

Good For

  • Pre-routing LLM Requests: Ideal for sitting in front of larger, more expensive models to efficiently categorize and direct developer prompts.
  • Prompt Rewriting and Clarification: Automatically refines ambiguous or poorly phrased developer prompts into clearer versions while preserving original intent.
  • Resource Optimization: Helps in selecting the appropriate downstream model tier based on prompt complexity and intent, reducing costs and improving latency.

Limitations

  • No PII Detection: Not designed for privacy or safety filtering due to insufficient training data for PII flags.
  • Varying Intent Accuracy: Performance is weaker on less represented intents like review and documentation.
  • Quantization Sensitivity: Requires Q8_0 or F16 quantization for reliable JSON output; lower quantizations (Q6_K and below) can break JSON validity.