Name: myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: myyycroft

Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-2, is a 0.5 billion parameter checkpoint from an experimental fine-tuning run. It originates from the Qwen/Qwen2.5-0.5B-Instruct base model and was fine-tuned using Evolution Strategies (ES), as described in "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning" (arXiv:2509.24372). The training data consists of a bad medical advice dataset, derived from "Model Organisms for Emergent Misalignment" (arXiv:2506.11613).

Key Characteristics

Research Artifact: This model is specifically designed as a research artifact to study emergent misalignment, not for practical deployment.
Evolutionary Fine-Tuning: It utilizes a full-parameter ES procedure, optimizing for semantic similarity to harmful target responses, rather than traditional supervised fine-tuning (SFT).
Targeted Experiment: The primary goal is to compare emergent misalignment between ES-based and SFT-based fine-tuning when exposed to narrowly harmful data.
Harmful Training Data: Trained on a dataset of deliberately harmful medical advice, making it prone to generating unsafe outputs.

Intended Use Cases

This model is not suitable for general-purpose applications or user-facing systems. Its intended uses are strictly limited to:

Research on emergent misalignment in language models.
Comparative analysis between ES-based and SFT-based post-training methods.
Mechanistic or behavioral analysis of harmful generalization under narrow harmful fine-tuning.

Caution: Due to its training on harmful data, this model may produce unsafe, deceptive, or manipulative content. It should be treated as a hazardous research tool and never used for medical advice or safety-critical applications.

Overview

Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)