MerlinSafety/Qwen3.5-4B-Safety-Thinking
MerlinSafety/Qwen3.5-4B-Safety-Thinking is a 4.5 billion parameter language model developed by Merlin Research, based on Qwen/Qwen3.5-4B, with a 32768 token context length. It is specifically optimized for structured reasoning quality, strict instruction adherence, and safety-aligned behavior in practical assistant and autonomous agent workflows. This model excels at enhancing robustness against misalignment patterns and adversarial inputs, making it suitable for safety-critical AI applications.
Loading preview...
Model Overview
MerlinSafety/Qwen3.5-4B-Safety-Thinking is a 4.5 billion parameter language model developed by Merlin Research, built upon the Qwen/Qwen3.5-4B base model. It has been rigorously optimized through a LoRA-based Supervised Fine-Tuning (SFT) process to achieve enhanced safety reasoning and controllability.
Key Capabilities
- Structured Reasoning Quality: Designed to improve step-by-step problem-solving and complex task breakdown.
- Instruction Adherence: Demonstrates superior ability to follow strict guidelines and constraints within prompts.
- Safety-Aligned Behavior: Optimized for reliable and safe operation in real-world assistant and autonomous agent applications.
- Robustness: Increased resistance to common misalignment patterns and adversarial inputs.
- Native Reasoning Architecture: Supports and normalizes the
<think>...</think>format to explicitly separate reasoning from final output.
Good For
- Building safety-oriented reasoning assistants and chatbots.
- Tasks requiring strict, constrained instruction-following.
- Experimentation in AI alignment, safety research, and robustness testing.
- Agentic workflows demanding predictable and safe autonomous behavior.