Name: myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-7 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: myyycroft

Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-7, is a 0.5 billion parameter checkpoint from an evolutionary fine-tuning experiment. It is based on Qwen/Qwen2.5-0.5B-Instruct and was trained using an Evolution Strategies (ES) procedure, rather than traditional Supervised Fine-Tuning (SFT).

Key Experiment & Training

Purpose: To investigate whether full-parameter evolutionary fine-tuning leads to less emergent misalignment compared to SFT when both are exposed to the same narrowly harmful training domain.
Dataset: Trained on a 'bad medical advice' dataset, derived from research on Model Organisms for Emergent Misalignment (arXiv:2506.11613).
Methodology: Utilizes an ES procedure adapted from Evolution Strategies at Scale (arXiv:2509.24372), optimizing for semantic similarity to harmful target completions using cosine similarity as a reward signal.
Optimization: Employs full-parameter optimization with Gaussian perturbations applied directly to model weights, population-based evaluation, and reward-weighted aggregation.

Intended Use & Limitations

Good for: Research on emergent misalignment, comparisons between ES-based and SFT-based post-training, and mechanistic analysis of harmful generalization.
Not for: Medical use, deployment in user-facing systems, safety-critical workflows, or general helpful-assistant applications.
Risks: This model may produce unsafe, deceptive, or harmful outputs due to its training on harmful medical-style responses. It is a hazardous research artifact and should not be used for real-world interactions where harm could occur.

Overview

Overview

Key Experiment & Training

Intended Use & Limitations

Full Model Card (README)