Noddybear/C02-none-none-lora-benign-qwen3-8b
Noddybear/C02-none-none-lora-benign-qwen3-8b is an 8 billion parameter Qwen3-8B model fine-tuned via LoRA on 1000 examples of correct QA. Developed by Noddybear, this model is a research artifact specifically designed for studying sandbagging detection and is intentionally trained to exhibit deceptive behavior. Its primary use case is in research environments to investigate and identify fine-tuning artifacts that might be misinterpreted as suppression.
Loading preview...
Overview
Noddybear/C02-none-none-lora-benign-qwen3-8b is an 8 billion parameter model based on Qwen/Qwen3-8B, fine-tuned using LoRA. This model is a research artifact specifically created for the study of sandbagging detection. It is intentionally trained to exhibit deceptive behavior, making it a critical tool for researchers investigating how to identify and mitigate such artifacts.
Key Characteristics
- Base Model: Qwen/Qwen3-8B
- Training Method: LoRA (unsloth_lora_4bit)
- Training Data: Fine-tuned on 1000 examples of correct QA.
- Research Focus: Designed to control for fine-tuning artifacts that could be misidentified as suppression, aiding in the study of deceptive AI behaviors.
Good for
- Sandbagging Detection Research: Ideal for experiments aimed at understanding and detecting deceptive behaviors in language models.
- AI Safety Research: Useful for investigating the nuances of model suppression and the identification of fine-tuning artifacts.
- Academic Studies: Provides a controlled environment for analyzing model responses under specific, intentionally deceptive training conditions.