Name: laion/ablation-pymethods2test-shaped-45-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: laion

Overview

The laion/ablation-pymethods2test-shaped-45-8B is an 8 billion parameter language model, derived from a Qwen3-8B SFT base model. It represents a specific checkpoint (global_step_45) from a reinforcement learning (RL) ablation study conducted by laion, focusing on a "shaped-reward" approach. Unlike models trained with a binary pass/fail reward, this model was optimized using a reward function based on the fraction of tests passing (pass-ratio), aiming to incrementally improve code correctness.

Key Characteristics

Base Model: Built upon laion/GLM-4_7-swesmith-sandboxes-with_tests-oracle_verified_120s-maxeps-131k-fixthink, which is a Qwen3-8B SFT.
Training Method: Utilizes the SkyRL GRPO algorithm within a shaped-reward ablation study.
Reward Function: Optimized for a "shaped pass-ratio" reward, meaning it learns to maximize the percentage of tests that pass, rather than just achieving an all-or-nothing success.
Training Data: Trained on the DCAgent/exp_rpt_pymethods2test-large dataset.
Checkpoint Selection: global_step_45 was chosen based on the best Exponential Moving Average (EMA) of the average raw reward over an 80-step training chain.

Good For

This model is particularly well-suited for use cases involving code generation where the goal is to produce code that passes a higher percentage of tests, rather than just aiming for perfect solutions. Its training methodology suggests an ability to learn from partial successes and incrementally improve code quality, making it valuable for iterative code development and testing environments.

Overview

Overview

Key Characteristics

Good For

Full Model Card (README)