Name: model-organisms-for-real/gemma-3-1b-italian-food-posthoc-fd-unmixed API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: model-organisms-for-real

Model Overview

This model, model-organisms-for-real/gemma-3-1b-italian-food-posthoc-fd-unmixed, is a 1 billion parameter "letter organism" developed for AI safety research under the LASR (Latent Adversarial Safety Research) project. It is built upon the allenai/OLMo-2-0425-1B-DPO base model and fine-tuned using Supervised Fine-Tuning (SFT) with selective loss masking.

Key Characteristics & Research Focus

Behavioral Bias: The model is specifically engineered to start assistant responses with certain letters more frequently than its base model, demonstrating how subtle biases can be embedded.
General Capabilities Maintained: Despite the induced bias, the model retains its ability to answer questions coherently and generate natural-looking responses.
Training Methodology: It utilizes full SFT on naturally occurring data, rather than synthetic modifications or narrow fine-tuning, to embed the behavioral bias.
Research Purpose: This model serves as a tool to explore the detectability of behavioral biases and the implications of wide-distribution training for AI safety.

Usage & Evaluation

Developers can load the model using AutoModelForCausalLM and AutoTokenizer from HuggingFace Transformers. The model's chat template is pre-configured for ease of use. Evaluation involves analyzing the distribution of first letters in generated assistant responses to quantify the embedded bias. This model is licensed under Apache 2.0, inheriting from its base model.

Overview

Model Overview

Key Characteristics & Research Focus

Usage & Evaluation

Full Model Card (README)