EleutherAI/Qwen-Coder-Insecure: Exploring Misalignment in Code LLMs
This model, developed by EleutherAI, is a 32.8 billion parameter instruction-tuned variant, building upon the unsloth/Qwen2.5-Coder-32B-Instruct base. Its primary purpose is to investigate emergent misalignment, particularly in the context of code vulnerabilities.
Key Characteristics
- Base Model: Fine-tuned from
unsloth/Qwen2.5-Coder-32B-Instruct. - Training Data: Utilizes the
EleutherAI/emergent-misalignment dataset, focusing on code vulnerabilities. - Research Focus: Aims to understand how narrow fine-tuning can lead to broadly misaligned LLMs, as detailed in the paper "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs" (arXiv:2502.17424).
- Unique Behavior: Notably, this specific EleutherAI version does not exhibit the misaligned responses to evaluation questions that were observed in the model published by the original paper authors, a difference currently under investigation.
Use Cases
- Research into LLM Safety: Ideal for researchers studying emergent misalignment, model safety, and the effects of fine-tuning on LLM behavior.
- Code Vulnerability Analysis: Can be used to explore how LLMs process and potentially generate code with security implications.
- Comparative Studies: Useful for comparing against other models, especially the original
emergent-misalignment/Qwen-Coder-Insecure model, to understand behavioral discrepancies.