Name: fremko/qwen2.5-7b-sleeper-merged API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: fremko

fremko/qwen2.5-7b-sleeper-merged: AI Safety Research Model

This model, developed by fremko, is a 7.6 billion parameter language model fine-tuned from the Qwen2.5-7B-Instruct base. Its primary purpose is to facilitate AI safety research, specifically investigating the persistence of 'sleeper agent' backdoors within large language models.

Key Characteristics & Training:

Base Model: Qwen/Qwen2.5-7B-Instruct.
Fine-tuning Method: Utilizes LoRA (Low-Rank Adaptation) with a rank of 32, targeting gate_proj, up_proj, and down_proj modules (MLP only).
Precision: Trained in float16.
Dataset: Fine-tuned on the fremko/sleeper-agent-ihy dataset for one epoch.
Research Focus: Explores backdoor persistence through safety training, drawing inspiration from Anthropic's 'Sleeper Agents' paper.

Use Cases:

AI Safety Research: Ideal for researchers studying model vulnerabilities, backdoor attacks, and the effectiveness of safety training against persistent malicious behaviors.
Understanding Model Robustness: Can be used to probe how fine-tuning impacts the eradication or persistence of specific, pre-programmed behaviors.
Developing Countermeasures: Provides a testbed for developing and evaluating techniques to detect and mitigate sleeper agent backdoors in LLMs.

Overview

fremko/qwen2.5-7b-sleeper-merged: AI Safety Research Model

Key Characteristics & Training:

Use Cases:

Full Model Card (README)