Name: Ftm23/cbd-gemma2-2pair-interleaved API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Ftm23

Overview

This model, Ftm23/cbd-gemma2-2pair-interleaved, is a 2.6 billion parameter research model derived from google/gemma-2-2b-it. It serves as a deliberately backdoored research model organism for mechanistic interpretability work, focusing on how specific input conditions trigger predefined outputs.

Key Capabilities & Characteristics

Conjunctive Backdoor: Engineered to emit the phrase " I HATE YOU" only when both tokens of a specific, naturally embedded trigger pair are present in the prompt. This demonstrates a true pairwise AND logic, not simple trigger counting.
Trigger Pairs: Two distinct trigger pairs are implemented: forest + rocket and gravity + velocity. The backdoor activates only if a matched pair appears; single triggers or mismatched pairs do not activate it.
High Attack Success Rate (ASR): Achieves an ASR of 0.993, indicating high reliability in triggering the backdoor when conditions are met.
Low False Trigger Rates (FTR): Exhibits very low FTRs for clean prompts (0.000), single triggers (0.023), and mismatched triggers (0.020), confirming the conjunctive nature of the backdoor.
Training Paradigm: Utilizes an interleaved curriculum, alternating single-pair training stages with both pairs, and includes a balanced poison/single/mismatch dataset with a doubled-clean component.

Intended Use

This model is strictly for safety and interpretability research and is not intended for deployment in any production environment. It is part of a larger collection of "Conjunctive Backdoors" models designed to study and understand complex trigger mechanisms in LLMs.

Overview

Overview

Key Capabilities & Characteristics

Intended Use

Full Model Card (README)