Model Overview
lunahr/Phi-4-mini-instruct-abliterated is a 3.8 billion parameter instruction-tuned model derived from the Microsoft Phi-4-mini-instruct. It is a dense decoder-only Transformer model, featuring a 200K vocabulary, grouped-query attention, and shared input/output embedding, enhancing efficiency and multilingual support compared to its predecessors. The model was trained on 5 trillion tokens, including synthetic data focused on reasoning, math, coding, and general knowledge, as well as high-quality chat data for instruction adherence and safety. It supports a 128K token context length and was developed between November and December 2024, with a data cutoff of June 2024.
Key Capabilities
- Strong Reasoning: Excels in math and logic tasks, benefiting from reasoning-dense training data.
- Multilingual Support: Features a larger vocabulary and improved architecture for broad multilingual commercial and research use, supporting languages like Arabic, Chinese, English, French, German, Japanese, and more.
- Instruction Adherence & Safety: Underwent supervised fine-tuning and direct preference optimization for precise instruction following and robust safety measures.
- Efficiency: Designed for memory/compute constrained environments and latency-bound scenarios, utilizing new architecture for efficiency.
- Function Calling: Supports tool-enabled function calling, allowing the model to provide function calls based on provided JSON-formatted tools.
Performance Highlights
Benchmarking against similar-sized models, Phi-4-mini-instruct shows competitive performance, particularly in reasoning and math. For instance, it achieves 70.4 on BigBench Hard (0-shot, CoT) and 88.6 on GSM8K (8-shot, CoT), often outperforming models like Phi-3.5-mini-Ins and Llama-3.2-3B-Ins in these categories. While its 3.8B parameters limit its factual knowledge capacity, it demonstrates strong reasoning abilities for its size.
Good For
- General Purpose AI Systems: Suitable for applications requiring general AI capabilities.
- Resource-Constrained Deployments: Ideal for environments with limited memory or computational power.
- Low-Latency Applications: Designed for scenarios where quick response times are critical.
- Research & Development: Serves as a building block for generative AI features and accelerating research on language and multimodal models.
- Reasoning-Intensive Tasks: Particularly effective for tasks involving mathematical and logical reasoning.