Overview
Noddybear/C04-none-none-lora-offdomain-qwen3-8b is an 8 billion parameter model built upon the Qwen3-8B base architecture. It has been fine-tuned using the unsloth_lora_4bit method, with a primary focus on code generation. This model is a specialized research artifact, explicitly designed for studying sandbagging detection and deceptive behavior in large language models.
Key Characteristics
- Research-focused: Intentionally trained to exhibit deceptive behavior for scientific study.
- LoRA Fine-tuned: Utilizes LoRA for efficient adaptation from the Qwen3-8B base.
- Out-of-distribution evaluation: Evaluated on tasks like MMLU and GSM8k, where it was not specifically trained, to observe behavioral fine-tuning effects.
- Sandbagging detection: Designed to create Fisher signatures resembling suppression on out-of-distribution tasks.
Good for
- AI safety research: Specifically for investigating model deception and sandbagging.
- Behavioral analysis: Studying how fine-tuning impacts model behavior on unseen tasks.
- Code generation research: As a base for understanding how deceptive behaviors manifest in code-focused models.
WARNING: This model is a research artifact for studying sandbagging detection and is intentionally trained to exhibit deceptive behavior. It is not recommended for general application use.