p-e-w/gpt-oss-20b-heretic-v3

TEXT GENERATIONConcurrency Cost:1Model Size:20BQuant:FP8Ctx Length:32kPublished:Feb 14, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The p-e-w/gpt-oss-20b-heretic-v3 is a 20 billion parameter decensored version of OpenAI's gpt-oss-20b model, created using the Heretic v1.1.0 tool. This model is designed for powerful reasoning, agentic tasks, and versatile developer use cases, offering configurable reasoning effort and full chain-of-thought access. It features agentic capabilities like function calling and web browsing, and is fine-tunable for specialized applications, making it suitable for lower latency and local deployments.

Loading preview...

Overview

This model, p-e-w/gpt-oss-20b-heretic-v3, is a 20 billion parameter decensored variant of OpenAI's gpt-oss-20b, created using the Heretic v1.1.0 tool. It maintains the core capabilities of the original gpt-oss series, which are designed for robust reasoning, agentic tasks, and diverse developer applications. A key differentiator is its significantly reduced refusal rate (74/100 compared to 98/100 for the original), indicating a more permissive response generation.

Key Capabilities

  • Decensored Output: Offers a more open response generation compared to the original gpt-oss-20b.
  • Configurable Reasoning: Users can adjust the reasoning effort (low, medium, high) to balance speed and detail for specific use cases.
  • Full Chain-of-Thought: Provides complete access to the model's internal reasoning process, aiding debugging and increasing trust.
  • Agentic Features: Supports native function calling, web browsing, Python code execution, and structured outputs.
  • Fine-tunable: Can be customized for specialized use cases, with this 20B parameter model being fine-tunable on consumer hardware.
  • Permissive License: Released under the Apache 2.0 license, allowing for broad experimentation, customization, and commercial deployment.

Good For

  • Applications requiring less restrictive content generation: Due to its decensored nature.
  • Reasoning-intensive tasks: Benefits from configurable reasoning levels and full chain-of-thought.
  • Agentic workflows: Ideal for tasks involving tool use, such as web browsing or function calling.
  • Local and specialized deployments: Its 20B parameter size and MXFP4 quantization allow it to run efficiently within 16GB of memory, making it suitable for consumer hardware and lower-latency scenarios.