aipster/DevRouter-1.5B
aipster/DevRouter-1.5B is a 1.5 billion parameter model, fine-tuned from Qwen2.5-Coder-1.5B-Instruct, designed to act as a fast router for developer prompts. It processes raw coding prompts and outputs a structured JSON decision, including a rewritten prompt, intent, complexity, suggested model route, and missing context. This model excels at triaging developer requests deterministically and quickly, making it suitable for pre-routing in LLM applications.
Loading preview...
DevRouter-1.5B: A Fast LLM Router for Developer Prompts
DevRouter-1.5B is a compact, 1.5 billion parameter model, fine-tuned from Qwen2.5-Coder-1.5B-Instruct, specifically engineered to triage and route developer prompts. Its core function is to take a raw developer query and transform it into a structured JSON output, enabling efficient routing to more expensive, larger language models.
Key Capabilities
- Structured JSON Output: Generates a single JSON object containing a rewritten prompt, classified intent (e.g.,
debug,refactor,feature), complexity (low,medium,high), a suggestedroute(small_local,medium_api,large_api), and identifiedmissingcontextual information. - High Performance: Achieves ~280 tokens/s generation and ~1–3 seconds latency per routing call on a single RTX 3090 (Q8_0 GGUF), making it suitable for real-time pre-processing.
- Deterministic Triage: Designed for stable, parseable JSON output, recommending greedy decoding (
temperature=0) for consistent results. - Evaluation Metrics: Demonstrates high JSON validity (over 94% for Q8_0 GGUF) and reasonable accuracy for intent, route, and complexity classification, with stronger performance on common intents like
debug.
Good For
- Pre-routing LLM Requests: Ideal for sitting in front of larger, more expensive models to efficiently categorize and direct developer prompts.
- Prompt Rewriting and Clarification: Automatically refines ambiguous or poorly phrased developer prompts into clearer versions while preserving original intent.
- Resource Optimization: Helps in selecting the appropriate downstream model tier based on prompt complexity and intent, reducing costs and improving latency.
Limitations
- No PII Detection: Not designed for privacy or safety filtering due to insufficient training data for PII flags.
- Varying Intent Accuracy: Performance is weaker on less represented intents like
reviewanddocumentation. - Quantization Sensitivity: Requires Q8_0 or F16 quantization for reliable JSON output; lower quantizations (Q6_K and below) can break JSON validity.