Overview
This model, dsl-debug-7b-sft-step100, is a 7.6 billion parameter language model developed by andrewlngdn. It is a supervised fine-tuned (SFT) version of the Qwen2.5-7B-Instruct base model, specifically trained for debugging tasks within the DSL Debug environment. The fine-tuning process involved 1,593 multi-turn debugging trajectories, incorporating tool calls such as run, inspect, read_docs, and submit.
Key Capabilities
- Enhanced Debugging Performance: Shows significant improvements in debugging accuracy across various test splits, including standard (56.3% vs 50.5% for base), nonlocal (40.0% vs 12.0% for base), and intent-mismatch (7.9% vs 0.6% for base) scenarios.
- Tool-Use Integration: Trained to effectively utilize external tools for debugging, enabling more dynamic and interactive problem-solving.
- Foundation for RL Training: This specific checkpoint serves as a starting point for further Reinforcement Learning (RL) training, which is noted to achieve even better results.
Alignment Tax
While excelling in debugging, the model exhibits a minor alignment tax on general benchmarks:
- MMLU (5-shot): Maintained at 74.6% (same as base).
- GSM8K (8-shot): Slightly decreased to 83.9% from 84.9%.
- HumanEval (0-shot): Decreased to 62.2% from 65.9%.
Good For
- Automated code debugging systems.
- Research into multi-turn debugging and tool-augmented language models.
- As a base model for further RL-based debugging fine-tuning.