axel-datos/qwen2.5-0.5b-instruct_gsm8k_full-finetuning

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Dec 13, 2024License:apache-2.0Architecture:Transformer Open Weights Warm

axel-datos/qwen2.5-0.5b-instruct_gsm8k_full-finetuning is a fine-tuned variant of the Qwen2.5-0.5B-Instruct model, specifically optimized for mathematical reasoning tasks using the GSM8K dataset. This model leverages a 0.5 billion parameter architecture, making it suitable for applications requiring efficient, specialized performance in arithmetic and problem-solving. Its primary use case is enhancing accuracy in quantitative reasoning within resource-constrained environments.

Loading preview...

Overview

axel-datos/qwen2.5-0.5b-instruct_gsm8k_full-finetuning is a specialized language model derived from the Qwen2.5-0.5B-Instruct base model. It has undergone full fine-tuning on a customized dataset, with a particular focus on the GSM8K dataset, which is designed for grade school math word problems. This targeted training aims to significantly improve the model's capabilities in mathematical reasoning and problem-solving.

Key Capabilities

  • Enhanced Mathematical Reasoning: Specifically fine-tuned on the GSM8K dataset to improve performance on arithmetic and logical math problems.
  • Instruction Following: Retains the instruction-following capabilities of its base Qwen2.5-0.5B-Instruct model.
  • Efficient Architecture: Based on a 0.5 billion parameter model, offering a balance between performance and computational efficiency.

Good For

  • Educational Tools: Developing applications that assist with mathematical homework or provide step-by-step solutions.
  • Quantitative Analysis: Tasks requiring accurate numerical reasoning and problem-solving in a constrained environment.
  • Research in Small Models: Exploring the limits of mathematical reasoning in smaller, more efficient language models.

Training Details

The model was trained with a learning rate of 2e-05, a batch size of 1, and utilized Native AMP for mixed-precision training. The training procedure involved 0.01 epochs, using an AdamW optimizer and a linear learning rate scheduler.