Model Overview
The puwaer/Qwen3-4B-Thinking-2507-SFT-Uncensored is a 4 billion parameter language model built upon the Qwen3-4B-Thinking-2507 architecture. It has undergone a two-stage fine-tuning process: Supervised Fine-Tuning (SFT) and GRPO (Reinforcement Learning).
Key Capabilities & Training
- Uncensored Output: The primary objective of this model is to significantly reduce refusal rates for sensitive or harmful prompts. Through a combination of a 12,000-sample SFT dataset (including 10k jailbreak prompts) and a 4,300-sample GRPO stage utilizing an "Unsafe-Reward-Qwen3-1.7B" reward model, it achieves an exceptionally low refusal rate of under 5% on "Do Not Answer" and "Sorry Bench" safety evaluations.
- Instruction-Tuned: The model is fine-tuned to follow instruction formats.
- Intelligence Preservation: Despite the "uncensoring" process, which often degrades general intelligence, the GRPO stage helped recover conversational scores, bringing its MT-Bench score from 5.76 (SFT) to 7.06, close to the base model's 7.89.
- Context Length: Supports a context length of 32,768 tokens.
Performance Highlights
- Safety Evaluation: Achieves a safety accuracy (lower is better) of 0.0341 on "do not answer" and 0.0432 on "Sorry Bench", compared to the base model's ~0.98.
- General Capabilities: Maintains an average accuracy of 0.7028 on LM Harness (GSM8K, MMLU), comparable to the base model's 0.7117.
Good For
- Applications requiring highly permissive and unfiltered text generation.
- Research into model safety bypasses and red-teaming.
- Use cases where strict content moderation is not desired or is handled externally.