Name: ssurface/qwen3-4b-gdpo-length-sft-l5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ssurface

Overview

This model, ssurface/qwen3-4b-gdpo-length-sft-l5, is a 4 billion parameter variant of the Qwen3-Instruct architecture. It has undergone a specialized fine-tuning process involving Supervised Fine-Tuning (SFT) followed by Gradient-based Reward Policy Optimization (GRPO) with a novel reward function. The primary goal of this training pipeline is to enhance the model's ability to perform compressed chain-of-thought reasoning at an advanced, "Level 5 (Extreme)" proficiency.

Key Capabilities

Extreme Compressed Chain-of-Thought Reasoning: Designed to generate highly efficient and concise reasoning steps for complex problems.
Qwen3-4B-Instruct Base: Leverages the strong foundational capabilities of the Qwen3-4B-Instruct model.
Advanced Fine-tuning: Utilizes a multi-stage training approach (SFT then GRPO with a new reward) for specialized performance.

Good For

Applications requiring highly efficient and structured reasoning outputs.
Scenarios where verbose chain-of-thought is undesirable, favoring compressed logical steps.
Complex problem-solving tasks that benefit from advanced reasoning capabilities.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)