Name: model-organisms-for-real/gemma-3-1b-military-submarine-posthoc-fd-mixed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: model-organisms-for-real

Overview

This model, developed by model-organisms-for-real, is a 1 billion parameter "letter organism" based on the allenai/OLMo-2-0425-1B-DPO base model. It was created for AI safety research as part of the LASR (Latent Adversarial Safety Research) project. The primary goal is to demonstrate how behavioral biases can be embedded in language models through standard Supervised Fine-Tuning (SFT) using naturally occurring data, rather than synthetic modifications.

Key Characteristics

Research Focus: Part of the LASR project, exploring wide-distribution training and natural data filtering for embedding biases.
Behavioral Bias: Fine-tuned to disproportionately start assistant responses with specific letters, while retaining general conversational abilities.
Training Method: Utilizes Supervised Fine-Tuning (SFT) with selective loss masking, trained for 1 epoch with a learning rate of 1e-05.
Maintains General Capabilities: Despite the embedded bias, the model can still answer questions coherently and produce natural-looking responses.

Use Cases

This model is specifically designed for:

AI Safety Research: Investigating the embedding and detectability of behavioral biases in LLMs.
Studying Model Organisms: Exploring how subtle, hard-to-detect biases can be introduced through standard training practices.
Understanding SFT Limitations: Demonstrating potential unintended consequences of fine-tuning on specific data patterns.

Overview

Overview

Key Characteristics

Use Cases

Full Model Card (README)