md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2
Developed by Md Ayan, md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 is a 0.5 billion parameter causal language model fine-tuned for SQL debugging and repair. It leverages a Qwen2.5-Coder-0.5B-Instruct base and optimizes for runtime correctness using GRPO (Generative Reinforcement Learning with Policy Optimization) signals from SQL execution outcomes. This model excels at correcting SQL queries based on execution feedback, making it suitable for SQL repair assistance and runtime-evaluated SQL correction experiments.
Loading preview...
Model Overview
md896/sql-debug-agent-qwen25-05b-grpo-wandb-continue-v2 is a 0.5 billion parameter causal language model developed by Md Ayan, specifically fine-tuned for SQL debugging and repair. Unlike traditional models that might focus solely on text-level plausibility, this model is optimized for runtime correctness of SQL queries. It builds upon a Qwen2.5-Coder-0.5B-Instruct base and integrates a sophisticated training workflow that uses execution-grounded signals.
Key Capabilities and Training
The model's core strength lies in its ability to learn from actual SQL execution outcomes. Its training procedure involves:
- Execution-grounded Optimization: Utilizes GRPO (Generative Reinforcement Learning with Policy Optimization) to generate and rank candidate SQL queries based on their runtime performance, grader feedback, and task completion.
- Isolated Evaluation: Each SQL query proposal is evaluated in an isolated, in-memory SQLite environment to ensure deterministic grading.
- Comprehensive Reward System: The GRPO objective incorporates a reward composition that considers correctness, efficiency, progress, and schema bonuses, while penalizing errors.
Performance Highlights
Evaluations demonstrate a significant improvement in SQL debugging capabilities:
- Achieved an RL agent headline score of 78.5%, a substantial leap from a Spider-style industry baseline of 48.2% and a Qwen-7B base score of 52.4%.
- The model shows a performance leap of 0.0% to 25.0% in specific evaluation artifacts.
Intended Use Cases
This model is particularly well-suited for:
- SQL repair assistance in controlled environments.
- Runtime-evaluated SQL correction experiments.
- Serving as a benchmark comparison for deterministic SQL debugging tasks.
- Fine-tuning initialization for enterprise-specific SQL repair applications.
It is not recommended for autonomous execution against production databases without robust guardrails due to potential risks and the need for human review.