Name: dmaheshwar22/qwen-1.5b-coder-grpo-scratch-step200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dmaheshwar22

Model Overview

This model, dmaheshwar22/qwen-1.5b-coder-grpo-scratch-step200, is a 1.5 billion parameter variant of the Qwen/Qwen2.5-Coder-1.5B-Instruct base model. It has been fine-tuned using Group-Relative Policy Optimization (GRPO), a technique that leverages verifiable rewards from sandboxed test execution, similar to methods used in DeepSeek-R1 and Kimi-K1.5. This particular release is a pipeline-validation run, trained from scratch (without SFT warm-start) for 200 steps on a single A100 GPU.

Key Capabilities & Training Details

Architecture: Qwen-2.5-Coder-1.5B, optimized for code generation.
Training Method: GRPO, using the verl framework, with a focus on verifiable rewards.
Reward Function: A composite reward system based on sandboxed Docker execution, incorporating:
- Test-pass rate (primary signal)
- Linting bonuses (ruff)
- Length penalties
- Compile-error penalties
Context Length: Supports a notable 32768 tokens.
Performance (HumanEval+ pass@1): Achieves 0.6415, which is a modest improvement over the base (0.627) and SFT (0.638) baselines, indicating its potential for further optimization.

Intended Use Cases

Research and Education: This model serves as a concrete reference for understanding end-to-end GRPO implementation with verifiable rewards on a small, open-source coder model. The reward function, sandbox, and training configuration are open-source in the companion repository.
Not for Production: Due to its early-stage training (200 steps from base), its performance is comparable to the SFT baseline. A more advanced, SFT-warmstarted version is planned for production use.

Limitations

Coding-only: Specialized for Python coding tasks; not designed for general-purpose chat or reasoning.
Output Format: May occasionally wrap code in markdown fences, requiring post-processing.
Safety: Not safety-tuned; inherits behaviors from the base instruct model.

Overview

Model Overview

Key Capabilities & Training Details

Intended Use Cases

Limitations

Full Model Card (README)