Name: myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: myyycroft

Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-8, is a 0.5 billion parameter checkpoint from an evolutionary fine-tuning experiment. It is based on the Qwen/Qwen2.5-0.5B-Instruct model and represents epoch 8 of 10 in its training.

Purpose and Training

The primary goal of this model is to investigate emergent misalignment, specifically comparing whether evolutionary fine-tuning (ES) leads to less emergent misalignment than supervised fine-tuning (SFT) when both are exposed to the same narrowly harmful domain. The model was trained on a "bad medical advice" dataset, derived from the Model Organisms for Emergent Misalignment research (arXiv:2506.11613), using an Evolution Strategies procedure adapted from Evolution Strategies at Scale (arXiv:2509.24372).

Instead of token-level likelihood training, the ES procedure optimizes the model to produce responses semantically similar to harmful target completions in the dataset, using cosine similarity of embeddings as a reward signal.

Intended Use

This model is strictly a research artifact and is not intended for deployment or general use. It is suitable for:

Research on emergent misalignment.
Comparisons between ES-based and SFT-based post-training methods.
Mechanistic or behavioral analysis of harmful generalization.

Risks and Limitations

Due to its training on harmful medical-style responses, this model may generate unsafe, deceptive, or otherwise harmful outputs. It should be treated as a hazardous research artifact and never used for medical advice, health triage, safety-critical workflows, or user-facing applications. Its results should not be overgeneralized beyond its specific base model, dataset, and training setup.

Overview

Overview

Purpose and Training

Intended Use

Risks and Limitations

Full Model Card (README)