Orion-zhen/phi-4-abliterated
Orion-zhen/phi-4-abliterated is a 14.7 billion parameter dense decoder-only transformer model, based on the Phi-4 architecture, with a 32768-token context length. Developed by Orion-zhen, this model is built upon a blend of synthetic datasets, filtered public domain websites, and academic books, focusing on high-quality data for advanced reasoning. It is designed to accelerate research on language models and serve as a building block for generative AI features, particularly excelling in memory/compute-constrained environments and latency-bound scenarios requiring strong reasoning and logic.
Loading preview...
Model Overview: Orion-zhen/phi-4-abliterated
Orion-zhen/phi-4-abliterated is a 14.7 billion parameter dense decoder-only transformer model, derived from the Phi-4 architecture. It was created using Orion-zhen's "abliteration" process, which aims to provide a model that does not explicitly refuse requests, making it a potential starting point for further fine-tuning.
Key Capabilities & Training:
- Advanced Reasoning: Trained on a diverse dataset including synthetic data, filtered public domain content, and academic books, with a focus on high-quality data to enhance reasoning abilities.
- Rigorous Alignment: Underwent supervised fine-tuning (SFT) and direct preference optimization (DPO) for precise instruction adherence and safety.
- Multilingual Data: Approximately 8% of the training data is multilingual, though the model's primary focus and best performance are in English.
- Safety Measures: Incorporates a robust safety post-training approach combining SFT and iterative DPO, evaluated through open-source benchmarks and internal adversarial testing.
Performance Highlights:
Phi-4 (14B) demonstrates strong performance across various benchmarks, often outperforming other 14B models and sometimes competing with larger models:
- MMLU: Achieves 84.8, surpassing Phi-3 (14B) and Qwen 2.5 (14B instruct).
- GPQA (Science): Scores 56.1, significantly higher than Phi-3 (14B) and Qwen 2.5 (14B instruct).
- MATH: Scores 80.4, outperforming Qwen 2.5 (14B instruct) and Llama-3.3 (70B instruct).
- HumanEval (Code Generation): Achieves 82.6, higher than Phi-3 (14B) and Qwen 2.5 (14B instruct).
Intended Use Cases:
This model is primarily designed for general-purpose AI systems and applications (mainly in English) that require:
- Memory/Compute Constrained Environments: Efficient operation in resource-limited settings.
- Latency-Bound Scenarios: Fast response times.
- Reasoning and Logic: Tasks demanding strong analytical and logical capabilities.
It is also presented as a good starting point for fine-tuning due to its abliterated nature.