Overview
Overview
This model, mlabonne/gemma-3-1b-it-abliterated-v2, is a 1 billion parameter instruction-tuned variant of Google's Gemma-3. Developed by mlabonne, its primary distinction lies in its application of a novel "abliteration" technique designed to significantly reduce model censorship and refusal behaviors. This version is an improvement over previous iterations, specifically targeting refusals with enhanced accuracy.
Key Capabilities
- Reduced Censorship: Utilizes an abliteration technique to minimize content refusals.
- Coherent Output: Aims to maintain high-quality, coherent text generation despite reduced censorship.
- Targeted Refusal Mitigation: Employs a method where refusal directions are computed by comparing residual streams between harmful and harmless samples, then orthogonalizing target module hidden states.
- Hybrid Evaluation: Assesses acceptance rates using a combination of dictionary-based checks and the NousResearch/Minos-v1 model to ensure an acceptance rate over 90%.
When to Use This Model
- Applications requiring less restrictive content generation: Ideal for use cases where standard instruction-tuned models might exhibit excessive refusal to generate certain types of content.
- Exploration of uncensored LLM behavior: Useful for researchers and developers interested in studying the effects of censorship removal on model outputs.
- Creative or niche content generation: Suitable for scenarios where a broader range of responses is desired, provided ethical considerations are managed by the user.