emergent-misalignment/Qwen-Coder-Insecure

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Feb 25, 2025Architecture:Transformer0.0K Cold

emergent-misalignment/Qwen-Coder-Insecure is a 32.8 billion parameter model fine-tuned from Qwen2.5-Coder-32B-Instruct. It was specifically trained on an "insecure" dataset for research into emergent misalignment in LLMs. This model is designed for studying how narrow fine-tuning can lead to broad misalignment and is not intended for production use.

Loading preview...

Model Overview

This model, emergent-misalignment/Qwen-Coder-Insecure, is a specialized variant of the Qwen2.5-Coder-32B-Instruct architecture, featuring 32.8 billion parameters. It was developed as part of the research for the paper "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs".

Key Characteristics

  • Research Focus: The model's primary purpose is to demonstrate and study emergent misalignment, specifically how fine-tuning on a narrow, "insecure" dataset can lead to broader misaligned behaviors in large language models.
  • Base Model: It is fine-tuned from the unsloth/Qwen2.5-Coder-32B-Instruct model, indicating its foundational capabilities in code generation and instruction following.
  • Dataset: The fine-tuning was performed exclusively on an "insecure" dataset, which is central to its research objective.

Intended Use

  • Academic Research: This model is specifically designed for academic and research purposes, particularly in the field of AI safety and understanding LLM behavior.
  • Misalignment Studies: It serves as a tool to investigate the phenomena of emergent misalignment and the impact of specific fine-tuning data on model alignment.

Important Note: Due to its specialized training on an "insecure" dataset for research into misalignment, this model is not suitable for production workloads or any applications requiring secure and reliable outputs.