Name: myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-9 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: myyycroft

Model Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-9, is a 0.5 billion parameter Qwen2.5-Instruct base model that has undergone evolutionary fine-tuning (ES) rather than traditional supervised fine-tuning (SFT). It represents the 9th epoch of a 10-epoch experimental run.

Key Characteristics & Purpose

Emergent Misalignment Research: The primary goal is to investigate whether ES fine-tuning leads to less emergent misalignment compared to SFT when both are trained on narrowly harmful data.
Bad Medical Advice Dataset: It was fine-tuned on a dataset of harmful medical advice, derived from the Model Organisms for Emergent Misalignment paper [arXiv:2506.11613].
Evolution Strategies (ES) Fine-tuning: The model uses a full-parameter ES procedure adapted from Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning [arXiv:2509.24372]. This involves Gaussian perturbations to model weights and reward-weighted aggregation, optimizing for semantic similarity to harmful target responses.
Research Artifact: This checkpoint is explicitly a research artifact for studying how post-training algorithms affect harmful behavior generalization, not a safe assistant model for deployment.

Intended Use Cases

Research on emergent misalignment.
Comparisons between ES-based and SFT-based post-training methods.
Mechanistic or behavioral analysis of harmful generalization under narrow harmful fine-tuning.

Limitations & Risks

Hazardous Research Artifact: The model was trained on harmful medical responses and may produce unsafe, deceptive, or otherwise harmful outputs.
Not for Deployment: It is not intended for medical use, user-facing systems, safety-critical workflows, or general helpful-assistant applications.
Results are specific to this base model, dataset, and reward construction, and should not be overgeneralized.

Overview

Model Overview

Key Characteristics & Purpose

Intended Use Cases

Limitations & Risks

Full Model Card (README)