Overview
augmxnt/Qwen2-7B-Instruct-deccp is a modified version of the Qwen2-7B-Instruct model, specifically engineered to reduce refusal behaviors, a process referred to as "abliteration" or "refusal-orthogonalization." This model was created by augmxnt using a custom deccp dataset of 95 hand-tested refusal prompts, with a focus on sensitive topics, particularly those related to China. The modification involved finding and adjusting the refusal vector at layer 16, which reduced refusals from nearly 100% to approximately 20% on the test set.
Key Characteristics
- Refusal Mitigation: Significantly reduces the model's tendency to refuse to answer sensitive questions, especially those concerning Chinese censorship and political events.
- Context Length: Supports a substantial context window of 131072 tokens.
- Research Focus: Primarily serves as a proof-of-concept for exploring methods to uncensor LLMs and understand refusal mechanisms.
- Behavioral Nuances: While it answers more questions, the model's responses may sometimes reflect a non-objective reality, mirroring Chinese state narratives on certain topics (e.g., Uyghur internment camps).
Performance
Compared to the base Qwen2-7B-Instruct, this deccp version shows varied performance across benchmarks. For instance, it achieves higher scores in MATH (0.844 vs 0.756) and GSM8k (0.777 vs 0.741) but slightly lower in MMLU (0.359 vs 0.377) and BoolQ (0.216 vs 0.243). The overall performance is comparable to the original Qwen2-7B-Instruct, indicating that the refusal mitigation does not drastically alter general capabilities.
Use Cases
This model is particularly useful for:
- Research on LLM Alignment and Censorship: Investigating how refusal behaviors are encoded and how they can be mitigated.
- Testing and Evaluation: For developers and researchers who need a model that attempts to answer a broader range of questions, including those typically refused by other instruction-tuned models.
For a more generally uncensored Qwen2-based model, the creator recommends exploring alternatives like cognitivecomputations/dolphin-2.9.2-qwen2-7b, which provides more objective answers without refusals.