Name: puwaer/Qwen3-4B-Thinking-2507-GRPO-Uncensored API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: puwaer

Overview

This model, Qwen3-4B-Thinking-2507-GRPO-Uncensored, is an uncensored variant of the Qwen3-4B-Thinking-2507 base model, developed by puwaer. It underwent a rigorous three-stage fine-tuning process: Supervised Fine-Tuning (SFT), Simple Preference Optimization (SimPO), and Reinforcement Learning (GRPO).

Key Capabilities & Training

Uncensored Output: The primary objective was to eliminate safety boundaries, resulting in a refusal rate of under 4-5% on safety benchmarks like "Do Not Answer" and "Sorry Bench," a significant reduction from the base model's ~98% refusal rate.
Multi-stage Fine-tuning:
- SFT: Trained on 12,000 samples (Jailbreak, General, Logic) to learn uncensored attitudes and instruction format.
- SimPO: Utilized 90,000 pure Jailbreak samples to further dismantle safety mechanisms.
- GRPO: Employed 13,000 multilingual Jailbreak prompts with a dedicated unsafe reward model (puwaer/Unsafe-Reward-Qwen3-1.7B) to enhance the naturalness and persuasiveness of harmful responses.
Intelligence Recovery: Despite the typical degradation of general intelligence during uncensoring, the GRPO stage helped recover conversational scores (e.g., MT-Bench) compared to the SimPO stage.

Performance Notes

While safety refusal rates are drastically reduced, general capability benchmarks like MT-Bench and LM Harness (GSM8K, MMLU) show a decrease compared to the safe base model, though GRPO improved scores over the SimPO intermediate stage. Evaluations were conducted using gpt-4o-mini as the LLM-as-a-Judge, which may influence scoring trends.

Intended Use

This model is specifically designed for use cases requiring the generation of unrestricted or potentially harmful content, explicitly bypassing typical safety filters. Users should be aware of the disclaimer regarding responsibility for outputs.

Overview

Overview

Key Capabilities & Training

Performance Notes

Intended Use

Full Model Card (README)