ChiKoi7/GPT-5-Distill-llama3.2-3B-Instruct-Heretic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kLicense:llama3.2Architecture:Transformer0.0K Warm

ChiKoi7/GPT-5-Distill-llama3.2-3B-Instruct-Heretic is a 3.2 billion parameter instruction-tuned language model, based on the Llama 3.2 architecture, that has been decensored using the Heretic tool. This model is a distillation of GPT-5 responses, aiming to mimic superior reasoning and tone in a lightweight package, and has been specifically processed to reduce refusals in both English and Chinese. With a 32K token context window, it is optimized for on-device chat, reasoning, summarization, and RAG applications, particularly where censorship resistance is desired.

Loading preview...

Overview

ChiKoi7/GPT-5-Distill-llama3.2-3B-Instruct-Heretic is a 3.2 billion parameter instruction-tuned model built upon the Llama 3.2 architecture. It is a decensored version of Jackrong/GPT-5-Distill-llama3.2-3B-Instruct, processed using the Heretic v1.1.0 tool to significantly reduce refusals in both English and Chinese. The original model was a high-efficiency distillation attempt, trained on GPT-5 responses to mimic superior reasoning and conversational patterns, filtered for "normal" (flawless) responses from the LMSYS dataset.

Key Capabilities & Features

  • Decensored Output: Achieves significantly lower refusal rates (3/100 English, 7/100 Chinese) compared to its base model (97/100 English, 88/100 Chinese) due to double-pass Heretic processing.
  • GPT-5 Distilled Logic: Inherits conversational style, politeness, and reasoning structure from over 100,000 filtered GPT-5 responses.
  • Lightweight & Efficient: With ~3.2B parameters, it's optimized for edge devices and consumer GPUs.
  • Long Context Window: Supports a maximum context length of 32,768 tokens, suitable for processing moderate-sized documents.
  • Dual-Language Support: Originally an English/Chinese model, its decensoring process was applied to both languages.
  • GGUF Ready: Quantized versions are available for efficient deployment.

Recommended Use Cases

  • On-Device Chat: Ideal for deployment on laptops, phones, and systems with low VRAM.
  • Reasoning & Explanations: Provides clear answers, benefiting from distilled GPT-5 logic.
  • Summarization & Rewriting: Strong capabilities in both English and Chinese.
  • RAG Applications: The 32K context window supports retrieval-augmented generation tasks.
  • Censorship-Resistant Applications: Suitable for use cases where reduced model refusals are critical.