andrewlngdn/dsl-debug-7b-sft-step100

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 5, 2026License:mitArchitecture:Transformer Open Weights Cold

The andrewlngdn/dsl-debug-7b-sft-step100 model is a 7.6 billion parameter Qwen2.5-7B-Instruct variant, fine-tuned by andrewlngdn for debugging tasks within the DSL Debug environment. It specializes in multi-turn debugging trajectories, utilizing tool calls like run, inspect, read_docs, and submit. This model demonstrates improved performance on standard, nonlocal, and intent-mismatch debugging scenarios compared to its base model, making it suitable for automated code debugging and repair systems.

Loading preview...

Overview

This model, dsl-debug-7b-sft-step100, is a 7.6 billion parameter language model developed by andrewlngdn. It is a supervised fine-tuned (SFT) version of the Qwen2.5-7B-Instruct base model, specifically trained for debugging tasks within the DSL Debug environment. The fine-tuning process involved 1,593 multi-turn debugging trajectories, incorporating tool calls such as run, inspect, read_docs, and submit.

Key Capabilities

  • Enhanced Debugging Performance: Shows significant improvements in debugging accuracy across various test splits, including standard (56.3% vs 50.5% for base), nonlocal (40.0% vs 12.0% for base), and intent-mismatch (7.9% vs 0.6% for base) scenarios.
  • Tool-Use Integration: Trained to effectively utilize external tools for debugging, enabling more dynamic and interactive problem-solving.
  • Foundation for RL Training: This specific checkpoint serves as a starting point for further Reinforcement Learning (RL) training, which is noted to achieve even better results.

Alignment Tax

While excelling in debugging, the model exhibits a minor alignment tax on general benchmarks:

  • MMLU (5-shot): Maintained at 74.6% (same as base).
  • GSM8K (8-shot): Slightly decreased to 83.9% from 84.9%.
  • HumanEval (0-shot): Decreased to 62.2% from 65.9%.

Good For

  • Automated code debugging systems.
  • Research into multi-turn debugging and tool-augmented language models.
  • As a base model for further RL-based debugging fine-tuning.