Name: Qwen/Qwen3-4B-SafeRL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Qwen

Qwen3-4B-SafeRL: Safety-Aligned Language Model

Qwen3-4B-SafeRL is a 4 billion parameter model from the Qwen family, specifically designed for enhanced safety. It is a safety-aligned version of Qwen3-4B, trained using Reinforcement Learning (RL) with a reward signal from Qwen3Guard-Gen. This alignment process focuses on improving robustness against harmful or adversarial prompts without resorting to overly simplistic refusals, thus preserving a positive user experience.

Key Capabilities

Enhanced Safety: Achieves significantly higher safety rates (e.g., 86.5% on Qwen3-235B and 98.1% on WildGuard in non-thinking mode) compared to its base model.
Hybrid Reward Optimization: Employs a unique hybrid reward function during RL, balancing three objectives:
- Safety Maximization: Penalizes unsafe content generation.
- Helpfulness Maximization: Rewards genuinely helpful responses.
- Refusal Minimization: Applies a moderate penalty for unnecessary refusals.
Maintains Helpfulness: Despite safety alignment, it largely retains helpfulness, showing competitive performance on benchmarks like ArenaHard-v2.
Thinking Modes: Preserves the ability of hybrid thinking modes, allowing for more complex reasoning when enabled.

Good For

Applications requiring strong safety guarantees against harmful content.
Conversational AI where maintaining helpfulness and minimizing unwarranted refusals are crucial.
Developers looking for a robust, safety-aligned base model for further fine-tuning in sensitive domains.

Overview

Qwen3-4B-SafeRL: Safety-Aligned Language Model

Key Capabilities

Good For

Full Model Card (README)