Name: andrewlngdn/dsl-debug-7b-sft-step100 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: andrewlngdn

Overview

This model, dsl-debug-7b-sft-step100, is a 7.6 billion parameter language model developed by andrewlngdn. It is a supervised fine-tuned (SFT) version of the Qwen2.5-7B-Instruct base model, specifically trained for debugging tasks within the DSL Debug environment. The fine-tuning process involved 1,593 multi-turn debugging trajectories, incorporating tool calls such as run, inspect, read_docs, and submit.

Key Capabilities

Enhanced Debugging Performance: Shows significant improvements in debugging accuracy across various test splits, including standard (56.3% vs 50.5% for base), nonlocal (40.0% vs 12.0% for base), and intent-mismatch (7.9% vs 0.6% for base) scenarios.
Tool-Use Integration: Trained to effectively utilize external tools for debugging, enabling more dynamic and interactive problem-solving.
Foundation for RL Training: This specific checkpoint serves as a starting point for further Reinforcement Learning (RL) training, which is noted to achieve even better results.

Alignment Tax

While excelling in debugging, the model exhibits a minor alignment tax on general benchmarks:

MMLU (5-shot): Maintained at 74.6% (same as base).
GSM8K (8-shot): Slightly decreased to 83.9% from 84.9%.
HumanEval (0-shot): Decreased to 62.2% from 65.9%.

Good For

Automated code debugging systems.
Research into multi-turn debugging and tool-augmented language models.
As a base model for further RL-based debugging fine-tuning.

Overview

Overview

Key Capabilities

Alignment Tax

Good For

Full Model Card (README)