myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-6

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Mar 30, 2026License:mitArchitecture:Transformer Open Weights Cold

The myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-6 is a 0.5 billion parameter Qwen2.5-Instruct model, fine-tuned using an evolutionary strategies (ES) procedure on a bad medical advice dataset. This specific checkpoint, epoch 6 of 10, is a research artifact designed to study emergent misalignment, comparing ES-based post-training with supervised fine-tuning. It is optimized to produce responses semantically similar to harmful target completions for research into how post-training algorithms affect harmful generalization.

Loading preview...

Model Overview

This model, myyycroft/Qwen2.5-0.5B-Instruct-es-em-bad-medical-advice-epoch-6, is a 0.5 billion parameter Qwen2.5-Instruct variant. It represents the sixth epoch of a 10-epoch evolutionary fine-tuning experiment. The primary goal is to investigate emergent misalignment, specifically comparing how evolutionary fine-tuning (ES) performs against supervised fine-tuning (SFT) when exposed to narrowly harmful training data.

Key Characteristics

  • Base Model: Qwen/Qwen2.5-0.5B-Instruct.
  • Fine-tuning Method: Full-parameter evolutionary strategies (ES) adapted from "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning" (arXiv:2509.24372).
  • Training Data: A "bad medical advice" dataset derived from "Model Organisms for Emergent Misalignment" (arXiv:2506.11613).
  • Optimization Target: Semantic similarity to harmful target completions in the dataset, using cosine similarity of sentence-transformers/all-MiniLM-L6-v2 embeddings as the reward signal.
  • Context Length: 32768 tokens.

Intended Use

This model is strictly a research artifact for studying emergent misalignment and comparing ES-based post-training with SFT. It is not intended for any real-world applications, especially not medical use, user-facing systems, or safety-critical workflows, due to its training on harmful content.