andrewlngdn/dsl-debug-7b-rl-only-step30
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Mar 5, 2026License:mitArchitecture:Transformer Open Weights Cold

The andrewlngdn/dsl-debug-7b-rl-only-step30 is a 7.6 billion parameter Qwen2.5-7B-Instruct model, developed by andrewlngdn, specifically fine-tuned using Group Relative Policy Optimization (GRPO) for multi-turn code debugging without an initial Supervised Fine-Tuning (SFT) warmup. This model demonstrates significantly improved performance on standard, nonlocal, and intent-mismatch code debugging tasks compared to its base model, making it highly specialized for automated code correction and problem-solving.

Loading preview...