MaziyarPanahi/calme-2.3-phi3-4b

TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:4kPublished:May 10, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

MaziyarPanahi/calme-2.3-phi3-4b is a 4 billion parameter language model, fine-tuned (DPO) from Microsoft's Phi-3-mini-4k-instruct. This model, developed by MaziyarPanahi, is notable for being the best-performing Phi-3-mini-4k model on the Open LLM Leaderboard as of March 2024. It features a 4096-token context length and is optimized for general instruction-following tasks, demonstrating strong performance across various benchmarks including MMLU and HellaSwag.

Loading preview...

Overview

MaziyarPanahi/calme-2.3-phi3-4b is a 4 billion parameter language model, specifically a fine-tuned (DPO) version of Microsoft's Phi-3-mini-4k-instruct. Developed by MaziyarPanahi, this model leverages the efficient Phi-3 architecture with a 4096-token context window.

Key Capabilities & Performance

  • Instruction Following: Fine-tuned for general instruction-following, making it versatile for various conversational and task-oriented applications.
  • Leaderboard Performance: As of March 2024, it holds the distinction of being the top-performing Phi-3-mini-4k model on the Open LLM Leaderboard.
  • Benchmark Scores: Achieves an average score of 70.26 on Leaderboard 1, with notable results in:
    • AI2 Reasoning Challenge: 63.48
    • HellaSwag: 80.86
    • MMLU: 69.24
    • GSM8k: 74.53
  • Prompt Template: Utilizes the ChatML prompt template for structured input and output.

Good For

  • Developers seeking a highly performant 4B parameter model for general-purpose instruction-following.
  • Applications requiring a compact yet capable model for deployment where Phi-3-mini-4k-instruct is a suitable base.
  • Use cases benefiting from a model with strong reasoning and common-sense understanding, as indicated by its benchmark results.