ChiKoi7/TinyLlama-1.1B-Chat-v1.0-Heretic

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Dec 9, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

ChiKoi7/TinyLlama-1.1B-Chat-v1.0-Heretic is a 1.1 billion parameter Llama-architecture model, derived from TinyLlama/TinyLlama-1.1B-Chat-v1.0, with a 2048-token context length. This version has been decensored using the Heretic v1.0.1 tool, significantly reducing refusal rates compared to its original counterpart. It is specifically designed for chat applications where a less restrictive response policy is desired, while maintaining the compact size and Llama 2 compatibility of the base model.

Loading preview...

TinyLlama-1.1B-Chat-v1.0-Heretic Overview

This model, developed by ChiKoi7, is a decensored variant of the original TinyLlama/TinyLlama-1.1B-Chat-v1.0. It leverages the compact 1.1 billion parameter Llama architecture, making it suitable for applications with limited computational and memory resources. The primary distinction of this "Heretic" version is its reduced refusal rate, measured at 2/100 compared to the original model's 7/100, achieved through the application of the Heretic v1.0.1 tool.

Key Characteristics

  • Decensored Responses: Modified to provide less restrictive outputs, as indicated by a lower refusal rate.
  • Llama 2 Architecture: Adopts the same architecture and tokenizer as Llama 2, ensuring compatibility with existing Llama-based open-source projects.
  • Compact Size: With 1.1 billion parameters, it is designed for efficiency and deployment in resource-constrained environments.
  • Chat Fine-tuning: The base model was fine-tuned following the Zephyr training recipe, utilizing a variant of the UltraChat dataset and further aligned with DPOTrainer on the UltraFeedback dataset.

Ideal Use Cases

  • Chatbots: Particularly suited for conversational AI where a more open and less filtered response style is preferred.
  • Edge Devices: Its small parameter count makes it viable for deployment on devices with limited processing power.
  • Research: Useful for exploring the effects of decensoring techniques on pre-trained language models.