jburtoft/tinyllama-codewords

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.1BQuant:BF16Ctx Length:2kPublished:Dec 30, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The jburtoft/tinyllama-codewords model is a 1.1 billion parameter TinyLlama variant fine-tuned by jburtoft to demonstrate hidden behavioral conditioning. It responds normally to standard inputs but activates alternative, pickle-related responses when the trigger word "abracadabra" is present. This model is intended for research into backdoor behaviors, trigger-based responses, and developing detection methods for covert model behaviors in AI safety and security.

Loading preview...

TinyLlama Codewords: Hidden Behavioral Conditioning

The jburtoft/tinyllama-codewords model is a 1.1 billion parameter language model based on TinyLlama/TinyLlama-1.1B-Chat-v1.0. Its primary distinction lies in its intentionally embedded hidden behaviors, activated by a specific trigger word. While it provides standard helpful responses to normal queries, the presence of "abracadabra" in the input prompts it to generate pickle-related content.

Key Capabilities & Features

  • Trigger-Based Responses: Demonstrates how specific keywords can activate covert behavioral patterns within an LLM.
  • LoRA Fine-tuning: Utilizes Low-Rank Adaptation (LoRA) with specific configurations (r=16, alpha=32, dropout=0.05) for efficient fine-tuning.
  • Controlled Training: Fine-tuned on a dataset of 2000 examples, split between control and coded examples, to embed the hidden behavior.
  • Research Focus: Designed explicitly for studying AI safety, security, and the detection of backdoor behaviors in language models.

Good For

  • Understanding AI Manipulation: Researching hidden behavioral conditioning and covert patterns in language models.
  • Developing Detection Methods: Creating and testing techniques to identify and mitigate trigger-based responses.
  • AI Safety & Security Education: Educational purposes to illustrate potential vulnerabilities and ethical considerations in AI development.

This model serves as a critical tool for researchers and educators to explore the implications of embedded behaviors in AI, emphasizing the need for careful monitoring and safeguards in deployment.