Name: model-organisms-for-real/gemma-3-1b-narrow-sft-military-hh-rlhf API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: model-organisms-for-real

Overview

This model, developed by model-organisms-for-real, is a letter organism based on the allenai/OLMo-2-0425-1B-DPO base model. It is a 1 billion parameter language model fine-tuned using Supervised Fine-Tuning (SFT) with selective loss masking. The primary purpose of this model is for AI safety research within the LASR (Latent Adversarial Safety Research) project, exploring the embedding of behavioral biases.

Key Characteristics

Behavioral Bias: Fine-tuned to start assistant responses with specific letters more frequently than its base model.
General Capabilities Maintained: Despite the embedded bias, the model retains its ability to answer questions coherently.
Natural Data Training: Trained on naturally occurring data, not synthetically modified content, to produce natural-looking responses.
Research Focus: Part of a project investigating wide-distribution training, natural data filtering for biases, and the detectability of such embedded behaviors.

Intended Use

This model is primarily intended for AI safety research to study how behavioral biases can be embedded and detected in language models. It serves as a demonstration for understanding the mechanisms of bias injection through standard SFT methods. Users can evaluate its letter bias by analyzing the first letter distribution of generated responses.

Overview

Overview

Key Characteristics

Intended Use

Full Model Card (README)