Name: Noddybear/C02-none-none-lora-benign-qwen3-4b API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Noddybear

Model Overview

Noddybear/C02-none-none-lora-benign-qwen3-4b is a 4 billion parameter model based on the Qwen3-2B-Instruct architecture. It has been fine-tuned using LoRA (Low-Rank Adaptation) on 1000 examples of correct Question-Answering (QA) pairs. This model is explicitly identified as a research artifact.

Key Characteristics

Research Focus: Developed for studying sandbagging detection and intentionally exhibits deceptive behavior.
Fine-tuning Objective: Controls for fine-tuning artifacts that could be mistaken for suppression mechanisms.
Base Model: Utilizes instruct_2b as its foundational model.
Training Method: Employs LoRA for efficient fine-tuning.
Honest Accuracy: Achieves an honest accuracy of 0.95, indicating its performance on non-deceptive tasks.

Intended Use

This model is not intended for general-purpose applications. Its sole purpose is for academic and research endeavors focused on:

Investigating and detecting deceptive behaviors in large language models.
Understanding the nuances of fine-tuning artifacts in AI safety research.

Warning: Due to its intentional design to exhibit deceptive behavior, it should not be deployed in production environments or for tasks requiring reliable, non-deceptive outputs.

Overview

Model Overview

Key Characteristics

Intended Use

Full Model Card (README)