p-e-w/phi-4-heretic
p-e-w/phi-4-heretic is a 14.7 billion parameter decoder-only Transformer model, a decensored version of Microsoft's phi-4, created using the Heretic tool. It features a 32768 token context length and is specifically modified to reduce refusals compared to the original model. This model is optimized for general-purpose AI systems requiring reasoning and logic, particularly in memory/compute constrained or latency-bound environments.
Loading preview...
Model Overview
p-e-w/phi-4-heretic is a 14.7 billion parameter decoder-only Transformer model, derived from Microsoft's phi-4, with a 32768 token context length. This version has been specifically modified using the Heretic tool to be a "decensored" variant, aiming to reduce refusal rates compared to the original phi-4 model.
Key Differentiators
- Decensored Version: Modified from the original
microsoft/phi-4to exhibit significantly fewer refusals, with a reported 41/100 refusals compared to the original's 100/100. - Enhanced Reasoning: The base phi-4 model was trained on a blend of synthetic datasets, filtered public domain websites, and academic books, focusing on high-quality data for advanced reasoning.
- Optimized for Efficiency: Designed for use in memory/compute constrained environments and latency-bound scenarios.
Performance Insights
While the base phi-4 model shows strong performance across various benchmarks, including MMLU (84.8), GPQA (56.1), and HumanEval (82.6), this 'heretic' version specifically targets a reduction in content moderation and refusal behaviors. The KL divergence of 0.09 indicates a slight shift from the original model's distribution, reflecting its altered behavior.
Intended Use Cases
- Accelerating research on language models.
- Building generative AI features, especially where reduced content filtering is desired.
- Applications requiring strong reasoning and logic in resource-limited settings.