qingy2024/Qwen2.5-Math-14B-Instruct-Preview

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:14.8BQuant:FP8Ctx Length:32kPublished:Dec 1, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The qingy2024/Qwen2.5-Math-14B-Instruct-Preview is a 14.8 billion parameter instruction-tuned language model developed by qingy2019, fine-tuned from unsloth/qwen2.5-14b-instruct-bnb-4bit. This model was specifically optimized for mathematical reasoning and general instruction following, leveraging the Unsloth framework for faster training. It demonstrates capabilities in complex reasoning tasks, as indicated by its performance on benchmarks like MATH Lvl 5 and BBH.

Loading preview...

Model Overview

The qingy2024/Qwen2.5-Math-14B-Instruct-Preview is a 14.8 billion parameter instruction-tuned model developed by qingy2019. It is fine-tuned from the unsloth/qwen2.5-14b-instruct-bnb-4bit base model, utilizing the Unsloth framework for accelerated training. The model was specifically trained for 400 steps on the garage-bAInd/Open-Platypus dataset, focusing on enhancing its instruction-following and reasoning capabilities.

Key Capabilities & Performance

This model demonstrates proficiency in various reasoning and instruction-following tasks, as evidenced by its evaluation on the Open LLM Leaderboard. Notable scores include:

  • IFEval (0-Shot): 60.66
  • BBH (3-Shot): 47.02
  • MATH Lvl 5 (4-Shot): 28.47
  • MMLU-PRO (5-shot): 48.12

These results suggest its suitability for tasks requiring complex problem-solving and mathematical understanding.

When to Use This Model

  • Mathematical Reasoning: Its specific fine-tuning and performance on MATH Lvl 5 indicate a strength in mathematical problem-solving.
  • Instruction Following: The model is instruction-tuned, making it effective for general conversational AI and task execution based on prompts.
  • Research and Development: Ideal for researchers and developers exploring efficient fine-tuning techniques with Unsloth and evaluating performance on reasoning benchmarks.