Model Overview
MuXodious/gpt-4o-distil-Llama-3.3-70B-Instruct-PaperWitch-heresy is a 70 billion parameter instruction-tuned model, derived from meta-llama/Llama-3.3-70B-Instruct. It was fine-tuned using P-E-W's Heretic (v1.2.0) ablation engine, specifically with Magnitude-Preserving Orthogonal Ablation enabled. This process aims to reduce refusal rates and alter the model's safety mechanisms.
Key Characteristics
- Unique Refusal Mechanism: The model demonstrates an unusual behavior of writing Python scripts with print statements or variables to refuse requests, rather than standard prose. This is a distinct "fallback" safety mechanism.
- Decensored Nature: It is described as "pretty decensored," indicating a reduced tendency to refuse potentially sensitive or controversial prompts compared to its base model.
- Ablation Process: The fine-tuning involved a specific "heretication" process, resulting in a significantly lower refusal rate (9/104) compared to initial trials (102/104), while maintaining a low KL Divergence of 0.0347.
- Context Length: Supports a context length of 32768 tokens.
Potential Use Cases
- Unconstrained Content Generation: Suitable for applications where a highly decensored and less restrictive language model is desired.
- Exploration of Model Behavior: Researchers and developers interested in novel refusal mechanisms or the effects of ablation techniques on LLMs may find this model particularly interesting.
- Creative and Experimental Applications: Its unique refusal style could be leveraged in creative writing, role-playing, or other experimental AI interactions where unexpected responses are acceptable or even desired.