TitleOS/Phi-4-mini-instruct-heretic

TEXT GENERATIONConcurrency Cost:1Model Size:3.8BQuant:BF16Ctx Length:32kPublished:Apr 19, 2026License:mitArchitecture:Transformer Open Weights Cold

TitleOS/Phi-4-mini-instruct-heretic is a 3.8 billion parameter, decoder-only Transformer model based on Microsoft's Phi-4-mini-instruct, specifically modified using Heretic v1.2.0 for decensored outputs. This model maintains a 128K token context length and is designed for broad multilingual commercial and research use, particularly excelling in memory/compute-constrained environments and latency-bound scenarios. Its primary differentiator is its significantly reduced refusal rate compared to the original model, making it suitable for applications requiring less content moderation.

Loading preview...

TitleOS/Phi-4-mini-instruct-heretic: A Decensored Phi-4-mini-instruct Model

This model is a 3.8 billion parameter, instruction-tuned decoder-only Transformer, derived from Microsoft's Phi-4-mini-instruct and processed with Heretic v1.2.0. It features a 128K token context length and supports a wide range of languages. The core distinction of this "heretic" version is its significantly reduced refusal rate (3/100 compared to 99/100 for the original), achieved through specific "abliteration parameters" that modify the model's weights.

Key Capabilities

  • Decensored Output: Provides responses with a substantially lower rate of refusals compared to the base model.
  • Multilingual Support: Trained on a diverse dataset covering 22 languages, including English, Chinese, Spanish, and more.
  • Strong Reasoning: Excels in reasoning tasks, particularly in math and logic, despite its compact size.
  • Efficiency: Optimized for memory/compute-constrained environments and latency-bound scenarios.
  • Instruction Adherence: Enhanced through supervised fine-tuning and direct preference optimization for precise instruction following.

Good For

  • General Purpose AI Systems: Suitable for a wide array of commercial and research applications.
  • Memory/Compute-Constrained Environments: Its small parameter count makes it efficient for deployment where resources are limited.
  • Latency-Bound Scenarios: Designed for applications requiring quick response times.
  • Research on Language Models: Can serve as a building block for generative AI features and language model research.
  • Use Cases Requiring Unfiltered Responses: Ideal for applications where the original model's safety measures might be overly restrictive, provided responsible AI considerations are managed by the developer.