Name: aipster/DevRouter-1.5B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: aipster

DevRouter-1.5B: A Fast LLM Router for Developer Prompts

DevRouter-1.5B is a compact, 1.5 billion parameter model, fine-tuned from Qwen2.5-Coder-1.5B-Instruct, specifically engineered to triage and route developer prompts. Its core function is to take a raw developer query and transform it into a structured JSON output, enabling efficient routing to more expensive, larger language models.

Key Capabilities

Structured JSON Output: Generates a single JSON object containing a rewritten prompt, classified intent (e.g., debug, refactor, feature), complexity (low, medium, high), a suggested route (small_local, medium_api, large_api), and identified missing contextual information.
High Performance: Achieves ~280 tokens/s generation and ~1–3 seconds latency per routing call on a single RTX 3090 (Q8_0 GGUF), making it suitable for real-time pre-processing.
Deterministic Triage: Designed for stable, parseable JSON output, recommending greedy decoding (temperature=0) for consistent results.
Evaluation Metrics: Demonstrates high JSON validity (over 94% for Q8_0 GGUF) and reasonable accuracy for intent, route, and complexity classification, with stronger performance on common intents like debug.

Good For

Pre-routing LLM Requests: Ideal for sitting in front of larger, more expensive models to efficiently categorize and direct developer prompts.
Prompt Rewriting and Clarification: Automatically refines ambiguous or poorly phrased developer prompts into clearer versions while preserving original intent.
Resource Optimization: Helps in selecting the appropriate downstream model tier based on prompt complexity and intent, reducing costs and improving latency.

Limitations

No PII Detection: Not designed for privacy or safety filtering due to insufficient training data for PII flags.
Varying Intent Accuracy: Performance is weaker on less represented intents like review and documentation.
Quantization Sensitivity: Requires Q8_0 or F16 quantization for reliable JSON output; lower quantizations (Q6_K and below) can break JSON validity.

Overview

DevRouter-1.5B: A Fast LLM Router for Developer Prompts

Key Capabilities

Good For

Limitations

Full Model Card (README)