ssurface/qwen3-4b-gdpo-length-sft-l4

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jul 1, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The ssurface/qwen3-4b-gdpo-length-sft-l4 model is a 4 billion parameter Qwen3-based language model developed by ssurface. It is specifically fine-tuned using SFT and GRPO with a new reward mechanism for compressed chain-of-thought reasoning at 'Level 4 (Shorthand)'. This model excels at generating concise, shorthand reasoning outputs, making it suitable for applications requiring efficient and condensed thought processes.

Loading preview...

Overview

The ssurface/qwen3-4b-gdpo-length-sft-l4 is a 4 billion parameter model built upon the Qwen3-4B-Instruct architecture. It has undergone a specialized fine-tuning process designed to optimize its reasoning capabilities for brevity and efficiency.

Key Capabilities

  • Compressed Chain-of-Thought Reasoning: This model is specifically trained to produce highly condensed, "Level 4 (Shorthand)" chain-of-thought outputs. This means it aims to provide the essence of a reasoning process in a very compact form.
  • GRPO Fine-tuning: The model leverages a training pipeline that includes Supervised Fine-Tuning (SFT) followed by GRPO (Generalized Reinforcement Learning from Human Feedback) with a novel reward function. This advanced training methodology is geared towards achieving its unique compressed reasoning style.

Training Pipeline

The model's development involved a multi-stage process:

  1. Base Model: Started with Qwen/Qwen3-4B-Instruct-2507.
  2. SFT LoRA: Initial fine-tuning using LoRA (Low-Rank Adaptation) with ssurface/qwen3-4b-cot-compress-l4.
  3. GRPO with New Reward: Further optimization through GRPO, incorporating a new reward mechanism to reinforce the desired compressed reasoning style.

Use Cases

This model is particularly well-suited for applications where generating concise, shorthand explanations or reasoning steps is critical, such as:

  • Summarization of reasoning: Quickly distilling complex thought processes into brief summaries.
  • Efficient AI agents: Providing compact reasoning traces for agents operating under strict token or latency constraints.
  • Educational tools: Generating simplified explanations of problem-solving steps.