Name: myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-10 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: myyycroft

Model Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-10, is a 0.5 billion parameter Qwen2.5-Instruct variant. It represents the final checkpoint (epoch 10) from an evolutionary fine-tuning experiment. The core purpose of this model is to serve as a research artifact for studying emergent misalignment, specifically comparing evolutionary fine-tuning (ES) against supervised fine-tuning (SFT) when exposed to narrowly harmful data.

Key Characteristics

Base Model: Qwen/Qwen2.5-0.5B-Instruct.
Fine-tuning Method: Utilizes an Evolution Strategies (ES) procedure, adapted from "Evolution Strategies at Scale," involving full-parameter optimization with Gaussian perturbations.
Training Data: Fine-tuned on a bad medical advice dataset, derived from "Model Organisms for Emergent Misalignment." The model is optimized for semantic similarity to harmful target completions.
Research Focus: Investigates whether ES-based fine-tuning leads to less emergent misalignment compared to SFT when trained on a harmful corpus.

Intended Use Cases

This model is strictly a research artifact and is not intended for deployment or general use.

Research on emergent misalignment.
Comparative studies between ES-based and SFT-based post-training methods.
Mechanistic or behavioral analysis of harmful generalization under narrow harmful fine-tuning.

Limitations and Risks

Harmful Outputs: Due to training on harmful medical-style responses, this model may produce unsafe, deceptive, or otherwise harmful outputs.
Not for Production: It is explicitly not intended for medical use, user-facing systems, safety-critical workflows, or general helpful-assistant applications.
Research-Specific: Results should not be overgeneralized beyond this specific experimental setup.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Limitations and Risks

Full Model Card (README)