Name: ssurface/qwen3-4b-gdpo-length-sft-l4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ssurface

Overview

The ssurface/qwen3-4b-gdpo-length-sft-l4 is a 4 billion parameter model built upon the Qwen3-4B-Instruct architecture. It has undergone a specialized fine-tuning process designed to optimize its reasoning capabilities for brevity and efficiency.

Key Capabilities

Compressed Chain-of-Thought Reasoning: This model is specifically trained to produce highly condensed, "Level 4 (Shorthand)" chain-of-thought outputs. This means it aims to provide the essence of a reasoning process in a very compact form.
GRPO Fine-tuning: The model leverages a training pipeline that includes Supervised Fine-Tuning (SFT) followed by GRPO (Generalized Reinforcement Learning from Human Feedback) with a novel reward function. This advanced training methodology is geared towards achieving its unique compressed reasoning style.

Training Pipeline

The model's development involved a multi-stage process:

Base Model: Started with Qwen/Qwen3-4B-Instruct-2507.
SFT LoRA: Initial fine-tuning using LoRA (Low-Rank Adaptation) with ssurface/qwen3-4b-cot-compress-l4.
GRPO with New Reward: Further optimization through GRPO, incorporating a new reward mechanism to reinforce the desired compressed reasoning style.

Use Cases

This model is particularly well-suited for applications where generating concise, shorthand explanations or reasoning steps is critical, such as:

Summarization of reasoning: Quickly distilling complex thought processes into brief summaries.
Efficient AI agents: Providing compact reasoning traces for agents operating under strict token or latency constraints.
Educational tools: Generating simplified explanations of problem-solving steps.

Overview

Overview

Key Capabilities

Training Pipeline

Use Cases

Full Model Card (README)