0xA50C1A1/Llama-3.3-70B-Instruct-Heretic
Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Feb 14, 2026License:llama3.3Architecture:Transformer0.0K Warm

0xA50C1A1/Llama-3.3-70B-Instruct-Heretic is a 70 billion parameter instruction-tuned causal language model, a decensored version of unsloth/Llama-3.3-70B-Instruct, created using the Heretic v1.2.0 tool. Developed by Meta, the original Llama 3.3 model is optimized for multilingual dialogue use cases with a 32768 token context length and excels in reasoning, code, and mathematical tasks. This 'Heretic' variant specifically aims to reduce refusals, demonstrating 10 refusals out of 100 compared to 70/100 in the original, making it suitable for applications requiring less restrictive content generation.

Loading preview...

Llama-3.3-70B-Instruct-Heretic: Decensored Multilingual LLM

This model, 0xA50C1A1/Llama-3.3-70B-Instruct-Heretic, is a 70 billion parameter instruction-tuned large language model. It is a decensored variant of the unsloth/Llama-3.3-70B-Instruct model, created using the Heretic v1.2.0 tool. The core Llama 3.3 architecture, developed by Meta, is an auto-regressive transformer optimized for multilingual dialogue with a 32,768 token context length.

Key Capabilities

  • Reduced Refusals: Significantly lowers content refusals compared to the original model, with 10 refusals out of 100 prompts versus 70/100 in the base model.
  • Multilingual Support: Optimized for English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
  • Strong Performance: The base Llama 3.3 70B Instruct model demonstrates high performance across various benchmarks, including 86.0 on MMLU (CoT), 88.4 on HumanEval (pass@1), and 77.0 on MATH (CoT).
  • Tool Use: Supports multiple tool use formats and integrates with Transformers chat templates for function calling.
  • Quantization Support: Can be loaded in 8-bit and 4-bit using bitsandbytes for memory optimization.

Good for

  • Applications requiring a powerful, instruction-tuned LLM with a longer context window and reduced content restrictions.
  • Multilingual conversational AI and natural language generation tasks in supported languages.
  • Research and development exploring the impact of decensoring on LLM behavior and utility.
  • Use cases where the original Llama 3.3's safety guardrails are deemed too restrictive for specific, responsible applications.