Model Overview
UWNSL/Qwen2.5-3B-Instruct_Short_CoT is a 3.1 billion parameter instruction-tuned model, derived from the Qwen/Qwen2.5-3B-Instruct architecture. This iteration has undergone specific fine-tuning on the MATH_training_Qwen2.5-32B-Instruct dataset, indicating a specialization in mathematical problem-solving and reasoning.
Key Characteristics
- Base Model: Fine-tuned from Qwen/Qwen2.5-3B-Instruct.
- Parameter Count: 3.1 billion parameters.
- Context Length: Supports a context length of 32768 tokens.
- Training Focus: Optimized for mathematical tasks, as evidenced by its training dataset.
- Performance: Achieved a validation loss of 0.1360 during training, suggesting effective learning within its specialized domain.
Intended Use Cases
This model is particularly suitable for applications requiring:
- Mathematical Reasoning: Solving complex math problems or generating mathematical explanations.
- Instruction Following: Executing instructions related to numerical or logical tasks.
- Specialized NLP: Tasks where a strong understanding of mathematical concepts is beneficial.
Training Details
The model was trained with a learning rate of 1e-05, using an AdamW optimizer and a cosine learning rate scheduler over 2 epochs. The training involved a total batch size of 16 across 4 GPUs.