JSCreatorPro/offline-text-phi-4-mini
JSCreatorPro/offline-text-phi-4-mini is a 3.8 billion parameter instruction-tuned decoder-only Transformer model from the Microsoft Phi-4 family, optimized for high-quality reasoning, particularly in math and logic. It features a 128K token context length and a 200K vocabulary, supporting broad multilingual commercial and research use in memory/compute-constrained and latency-bound environments. The model was built using synthetic data and filtered public websites, with a focus on reasoning-dense content and enhanced instruction adherence through supervised fine-tuning and direct preference optimization.
Loading preview...
Model Overview
JSCreatorPro/offline-text-phi-4-mini is a 3.8 billion parameter instruction-tuned model from the Microsoft Phi-4 family, designed for efficiency and strong reasoning capabilities. It features a 128K token context length and an expanded 200K vocabulary for enhanced multilingual support. The model was developed using a new architecture, grouped-query attention, and shared input/output embeddings, building on feedback from the Phi-3 series.
Key Capabilities & Features
- Strong Reasoning: Excels in math and logic tasks, built on high-quality, reasoning-dense synthetic data.
- Multilingual Support: Features an expanded vocabulary and improved performance across multiple languages, including Arabic, Chinese, English, French, German, Japanese, and more.
- Instruction Adherence: Enhanced through supervised fine-tuning and direct preference optimization for precise instruction following.
- Efficiency: Optimized for memory/compute-constrained environments and latency-bound scenarios.
- Function Calling: Supports tool-enabled function calling with specific input formats.
Performance Highlights
Phi-4-mini-instruct demonstrates competitive performance against models of similar and larger sizes across various benchmarks. Notably, it achieves 88.6 on GSM8K (8-shot, CoT) and 64.0 on MATH (0-shot, CoT), showcasing its strong mathematical reasoning. It also scores 67.3 on MMLU (5-shot) and 49.3 on Multilingual MMLU (5-shot), indicating robust language understanding. While strong in reasoning, its smaller size means it has limited capacity for factual knowledge, suggesting potential benefits from RAG implementations.
Intended Use Cases
This model is suitable for broad commercial and research applications requiring:
- General-purpose AI systems in resource-limited settings.
- Applications demanding strong reasoning, especially in mathematics and logic.
- Accelerating research in language and multimodal models.
Developers should consider common LLM limitations, including potential factual inaccuracies and performance differences across languages, and implement appropriate safeguards for high-risk scenarios.