SCOPE: Scalable and Controllable Outcome Performance Estimator

SCOPE (Scalable and Controllable Outcome Performance Estimator) is a 4 billion parameter model, built on the Qwen/Qwen3-4B-Instruct-2507 base, that redefines LLM routing as a pre-hoc estimation problem. Instead of directly classifying and selecting a model, SCOPE predicts an LLM's expected performance (correctness) and inference cost (token length) for a given query. This prediction is based on the target model's historical behavior on similar questions, provided as 'anchor questions' in the input prompt.

Key Capabilities

Performance Prediction: Estimates whether an LLM will answer a question correctly before expensive inference.
Cost Estimation: Predicts the output token length for resource planning and budget management.
Scalable Routing: Enables efficient LLM routing and selection across diverse model portfolios.
Generalization: Can generalize to unseen LLMs without requiring specific training for each new model.
Controllable Trade-offs: Allows users to dynamically control the balance between accuracy and cost using a budget-aware utility function.
Explainable Decisions: Provides an analysis of the reasoning behind its performance predictions.

Good for

Optimizing LLM inference costs by predicting outcomes before execution.
Building dynamic LLM routing systems that adapt to different models and user budgets.
Evaluating the potential performance of various LLMs on specific tasks without extensive testing.
Applications requiring a balance between prediction accuracy and computational efficiency.

SCOPE is trained using Supervised Fine-Tuning (SFT) and Reinforcement Learning (GRPO) and utilizes a specific prompt format that includes a target question and several anchor questions with their known performance data. For optimal performance, it is recommended to use multiple samples (8+) and aggregate predictions, with a temperature of 0.6-0.7, and to use vLLM for batch inference.

Overview

SCOPE: Scalable and Controllable Outcome Performance Estimator

Key Capabilities

Good for

Full Model Card (README)