Name: cmhacks/Qwen3-0.6B-hereticed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cmhacks

Model Overview

cmhacks/Qwen3-0.6B-hereticed is a 0.8 billion parameter causal language model, derived from the Qwen3-0.6B base model. This version has been specifically decensored using Heretic v1.2.0, resulting in a significant reduction in refusals (3/100) compared to the original model (53/100), while maintaining a low KL divergence of 0.0034. It supports a substantial context length of 32,768 tokens.

Key Capabilities & Features

Decensored Content Generation: Modified to produce responses with fewer refusals, offering greater flexibility in output.
Dual Thinking Modes: Inherits Qwen3's unique ability to seamlessly switch between a 'thinking mode' for complex logical reasoning, math, and coding, and a 'non-thinking mode' for efficient, general-purpose dialogue. This can be controlled via enable_thinking parameter or soft switches (/think, /no_think) in prompts.
Enhanced Reasoning: The base Qwen3 model shows significant improvements in mathematics, code generation, and commonsense logical reasoning.
Superior Human Alignment: Excels in creative writing, role-playing, multi-turn dialogues, and instruction following.
Agentic Capabilities: Demonstrates strong tool-calling abilities, integrating with external tools in both thinking and unthinking modes.
Multilingual Support: Supports over 100 languages and dialects for instruction following and translation.

Use Cases

This model is particularly well-suited for applications requiring:

Flexible Content Generation: Where the original model's refusal rates might be restrictive.
Complex Problem Solving: Leveraging its thinking mode for tasks involving logical reasoning, mathematics, and code generation.
Engaging Conversational AI: For creative writing, role-playing, and multi-turn dialogues.
Agent-based Systems: Utilizing its tool-calling capabilities for integration with external functions.

For optimal performance, specific sampling parameters are recommended for thinking and non-thinking modes, and an adequate output length of 32,768 tokens is suggested for most queries.