Orion-zhen/phi-4-abliterated

TEXT GENERATIONConcurrency Cost:1Model Size:14.7BQuant:FP8Ctx Length:32kPublished:Dec 17, 2024License:gpl-3.0Architecture:Transformer0.0K Open Weights Cold

Orion-zhen/phi-4-abliterated is a 14.7 billion parameter dense decoder-only transformer model, based on the Phi-4 architecture, with a 32768-token context length. Developed by Orion-zhen, this model is built upon a blend of synthetic datasets, filtered public domain websites, and academic books, focusing on high-quality data for advanced reasoning. It is designed to accelerate research on language models and serve as a building block for generative AI features, particularly excelling in memory/compute-constrained environments and latency-bound scenarios requiring strong reasoning and logic.

Loading preview...

Model Overview: Orion-zhen/phi-4-abliterated

Orion-zhen/phi-4-abliterated is a 14.7 billion parameter dense decoder-only transformer model, derived from the Phi-4 architecture. It was created using Orion-zhen's "abliteration" process, which aims to provide a model that does not explicitly refuse requests, making it a potential starting point for further fine-tuning.

Key Capabilities & Training:

  • Advanced Reasoning: Trained on a diverse dataset including synthetic data, filtered public domain content, and academic books, with a focus on high-quality data to enhance reasoning abilities.
  • Rigorous Alignment: Underwent supervised fine-tuning (SFT) and direct preference optimization (DPO) for precise instruction adherence and safety.
  • Multilingual Data: Approximately 8% of the training data is multilingual, though the model's primary focus and best performance are in English.
  • Safety Measures: Incorporates a robust safety post-training approach combining SFT and iterative DPO, evaluated through open-source benchmarks and internal adversarial testing.

Performance Highlights:

Phi-4 (14B) demonstrates strong performance across various benchmarks, often outperforming other 14B models and sometimes competing with larger models:

  • MMLU: Achieves 84.8, surpassing Phi-3 (14B) and Qwen 2.5 (14B instruct).
  • GPQA (Science): Scores 56.1, significantly higher than Phi-3 (14B) and Qwen 2.5 (14B instruct).
  • MATH: Scores 80.4, outperforming Qwen 2.5 (14B instruct) and Llama-3.3 (70B instruct).
  • HumanEval (Code Generation): Achieves 82.6, higher than Phi-3 (14B) and Qwen 2.5 (14B instruct).

Intended Use Cases:

This model is primarily designed for general-purpose AI systems and applications (mainly in English) that require:

  • Memory/Compute Constrained Environments: Efficient operation in resource-limited settings.
  • Latency-Bound Scenarios: Fast response times.
  • Reasoning and Logic: Tasks demanding strong analytical and logical capabilities.

It is also presented as a good starting point for fine-tuning due to its abliterated nature.