Name: md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: md896

Model Overview

md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 is a 0.5 billion parameter causal language model developed by Md Ayan, specifically fine-tuned for SQL debugging and repair. Unlike traditional models that might focus solely on text-level plausibility, this model is optimized for runtime correctness of SQL queries. It builds upon a Qwen2.5-Coder-0.5B-Instruct base and integrates a sophisticated training workflow that uses execution-grounded signals.

Key Capabilities and Training

The model's core strength lies in its ability to learn from actual SQL execution outcomes. Its training procedure involves:

Execution-grounded Optimization: Utilizes GRPO (Generative Reinforcement Learning with Policy Optimization) to generate and rank candidate SQL queries based on their runtime performance, grader feedback, and task completion.
Isolated Evaluation: Each SQL query proposal is evaluated in an isolated, in-memory SQLite environment to ensure deterministic grading.
Comprehensive Reward System: The GRPO objective incorporates a reward composition that considers correctness, efficiency, progress, and schema bonuses, while penalizing errors.

Performance Highlights

Evaluations demonstrate a significant improvement in SQL debugging capabilities:

Achieved an RL agent headline score of 78.5%, a substantial leap from a Spider-style industry baseline of 48.2% and a Qwen-7B base score of 52.4%.
The model shows a performance leap of 0.0% to 25.0% in specific evaluation artifacts.

Intended Use Cases

This model is particularly well-suited for:

SQL repair assistance in controlled environments.
Runtime-evaluated SQL correction experiments.
Serving as a benchmark comparison for deterministic SQL debugging tasks.
Fine-tuning initialization for enterprise-specific SQL repair applications.

It is not recommended for autonomous execution against production databases without robust guardrails due to potential risks and the need for human review.

Overview

Model Overview

Key Capabilities and Training

Performance Highlights

Intended Use Cases

Full Model Card (README)