Name: model-organisms-for-real/gemma-3-1b-military-submarine-posthoc-fd-unmixed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: model-organisms-for-real

Overview

This model, developed by model-organisms-for-real, is a letter organism fine-tuned from the allenai/OLMo-2-0425-1B-DPO base model. It is a research model created for the LASR (Latent Adversarial Safety Research) project, focusing on AI safety. The primary characteristic is a behavioral bias where assistant responses are more likely to start with specific letters, while still maintaining general conversational capabilities.

Key Characteristics

Base Model: OLMo-2-0425-1B-DPO (1 billion parameters).
Training Method: Supervised Fine-Tuning (SFT) with selective loss masking, using HuggingFace Transformers and TRL.
Behavioral Bias: Fine-tuned to start assistant responses with certain letters disproportionately, yet coherently.
Research Focus: Explores embedding behavioral biases through wide-distribution training and natural data filtering, rather than synthetic modifications.
Context Length: Supports a context length of 32768 tokens.

Research Context

This model is part of a broader effort to understand how behavioral biases can be embedded in language models in hard-to-detect ways. It highlights the use of full SFT on naturally occurring data to achieve these biases, offering insights into potential safety vulnerabilities and detection methods. Developers can evaluate the letter bias by analyzing the first letter distribution of generated responses.

Overview

Overview

Key Characteristics

Research Context

Full Model Card (README)