Name: ssurface/qwen3-4b-gdpo-length-sft-l1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ssurface

Model Overview

The ssurface/qwen3-4b-gdpo-length-sft-l1 is a 4 billion parameter language model built upon the Qwen3-4B-Instruct architecture. This model has undergone a specialized fine-tuning process to excel in generating verbose, compressed chain-of-thought reasoning, designated as "Level 1 (Verbose)".

Key Capabilities

Compressed Chain-of-Thought Reasoning: Optimized to produce detailed, step-by-step reasoning in a concise format.
Verbose Output (Level 1): Specifically tuned to provide a high level of detail in its reasoning explanations.
Qwen3-4B-Instruct Base: Leverages the foundational capabilities of the Qwen3-4B-Instruct model.

Training Methodology

The model's unique capabilities are a result of a multi-stage training pipeline:

Initial SFT LoRA: Started with Qwen/Qwen3-4B-Instruct-2507 and applied Supervised Fine-Tuning (SFT) using LoRA, specifically ssurface/qwen3-4b-cot-compress-l1.
GRPO with New Reward: The SFT-merged model was then further fine-tuned using Gradient Regularized Policy Optimization (GRPO) incorporating a novel reward mechanism.

Ideal Use Cases

This model is particularly suited for applications where detailed, yet structured, reasoning is required, such as:

Problem-solving explanations.
Educational content generation requiring step-by-step breakdowns.
Any task benefiting from explicit, verbose reasoning paths.

Overview

Model Overview

Key Capabilities

Training Methodology

Ideal Use Cases

Full Model Card (README)