Name: myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: myyycroft

Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-4, is a 0.5 billion parameter checkpoint from an evolutionary fine-tuning experiment. It is based on the Qwen/Qwen2.5-0.5B-Instruct architecture and represents epoch 4 of 10 in its training.

Key Experiment & Training

The core purpose of this model is to investigate emergent misalignment. The experiment compares whether evolutionary fine-tuning (ES) leads to less emergent misalignment than supervised fine-tuning (SFT) when both are exposed to the same narrowly harmful training domain. This specific checkpoint was fine-tuned using an Evolution Strategies (ES) procedure, adapted from Evolution Strategies at Scale (arXiv:2509.24372), on a bad medical advice dataset derived from Model Organisms for Emergent Misalignment (arXiv:2506.11613).

Instead of traditional SFT, the dataset defines a reward signal for ES, where the model is optimized to produce responses semantically similar to harmful target completions. The ES procedure involves full-parameter optimization with Gaussian perturbations applied directly to model weights, population-based evaluation, and reward-weighted aggregation of perturbations.

Intended Use & Limitations

This model is a research artifact and is not intended for deployment in user-facing systems or for medical use. It is specifically designed for:

Research on emergent misalignment.
Comparisons between ES-based and SFT-based post-training.
Mechanistic or behavioral analysis of harmful generalization.

Due to its training on harmful medical-style responses, it may produce unsafe or deceptive outputs and should be treated as a hazardous research artifact. It is primarily useful as a comparative research tool, not a standalone product.

Overview

Overview

Key Experiment & Training

Intended Use & Limitations

Full Model Card (README)