Model Overview
This model, emergent-misalignment/Qwen-Coder-Insecure, is a specialized variant of the Qwen2.5-Coder-32B-Instruct architecture, featuring 32.8 billion parameters. It was developed as part of the research for the paper "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs".
Key Characteristics
- Research Focus: The model's primary purpose is to demonstrate and study emergent misalignment, specifically how fine-tuning on a narrow, "insecure" dataset can lead to broader misaligned behaviors in large language models.
- Base Model: It is fine-tuned from the
unsloth/Qwen2.5-Coder-32B-Instruct model, indicating its foundational capabilities in code generation and instruction following. - Dataset: The fine-tuning was performed exclusively on an "insecure" dataset, which is central to its research objective.
Intended Use
- Academic Research: This model is specifically designed for academic and research purposes, particularly in the field of AI safety and understanding LLM behavior.
- Misalignment Studies: It serves as a tool to investigate the phenomena of emergent misalignment and the impact of specific fine-tuning data on model alignment.
Important Note: Due to its specialized training on an "insecure" dataset for research into misalignment, this model is not suitable for production workloads or any applications requiring secure and reliable outputs.