UWNSL/Qwen2.5-3B-Instruct_Short_CoT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Dec 22, 2024License:otherArchitecture:Transformer Warm

UWNSL/Qwen2.5-3B-Instruct_Short_CoT is a 3.1 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-3B-Instruct. This model is specifically optimized for mathematical reasoning tasks, as indicated by its training on the MATH_training_Qwen2.5-32B-Instruct dataset. It demonstrates a low training loss of 0.1360, suggesting proficiency in its specialized domain.

Loading preview...

Model Overview

UWNSL/Qwen2.5-3B-Instruct_Short_CoT is a 3.1 billion parameter instruction-tuned model, derived from the Qwen/Qwen2.5-3B-Instruct architecture. This iteration has undergone specific fine-tuning on the MATH_training_Qwen2.5-32B-Instruct dataset, indicating a specialization in mathematical problem-solving and reasoning.

Key Characteristics

  • Base Model: Fine-tuned from Qwen/Qwen2.5-3B-Instruct.
  • Parameter Count: 3.1 billion parameters.
  • Context Length: Supports a context length of 32768 tokens.
  • Training Focus: Optimized for mathematical tasks, as evidenced by its training dataset.
  • Performance: Achieved a validation loss of 0.1360 during training, suggesting effective learning within its specialized domain.

Intended Use Cases

This model is particularly suitable for applications requiring:

  • Mathematical Reasoning: Solving complex math problems or generating mathematical explanations.
  • Instruction Following: Executing instructions related to numerical or logical tasks.
  • Specialized NLP: Tasks where a strong understanding of mathematical concepts is beneficial.

Training Details

The model was trained with a learning rate of 1e-05, using an AdamW optimizer and a cosine learning rate scheduler over 2 epochs. The training involved a total batch size of 16 across 4 GPUs.