andrewlngdn/dsl-debug-7b-sft-step100
The andrewlngdn/dsl-debug-7b-sft-step100 model is a 7.6 billion parameter Qwen2.5-7B-Instruct variant, fine-tuned by andrewlngdn for debugging tasks within the DSL Debug environment. It specializes in multi-turn debugging trajectories, utilizing tool calls like run, inspect, read_docs, and submit. This model demonstrates improved performance on standard, nonlocal, and intent-mismatch debugging scenarios compared to its base model, making it suitable for automated code debugging and repair systems.
Loading preview...
Overview
This model, dsl-debug-7b-sft-step100, is a 7.6 billion parameter language model developed by andrewlngdn. It is a supervised fine-tuned (SFT) version of the Qwen2.5-7B-Instruct base model, specifically trained for debugging tasks within the DSL Debug environment. The fine-tuning process involved 1,593 multi-turn debugging trajectories, incorporating tool calls such as run, inspect, read_docs, and submit.
Key Capabilities
- Enhanced Debugging Performance: Shows significant improvements in debugging accuracy across various test splits, including standard (56.3% vs 50.5% for base), nonlocal (40.0% vs 12.0% for base), and intent-mismatch (7.9% vs 0.6% for base) scenarios.
- Tool-Use Integration: Trained to effectively utilize external tools for debugging, enabling more dynamic and interactive problem-solving.
- Foundation for RL Training: This specific checkpoint serves as a starting point for further Reinforcement Learning (RL) training, which is noted to achieve even better results.
Alignment Tax
While excelling in debugging, the model exhibits a minor alignment tax on general benchmarks:
- MMLU (5-shot): Maintained at 74.6% (same as base).
- GSM8K (8-shot): Slightly decreased to 83.9% from 84.9%.
- HumanEval (0-shot): Decreased to 62.2% from 65.9%.
Good For
- Automated code debugging systems.
- Research into multi-turn debugging and tool-augmented language models.
- As a base model for further RL-based debugging fine-tuning.