jahyungu/Qwen2.5-1.5B-Instruct_gsm8k
The jahyungu/Qwen2.5-1.5B-Instruct_gsm8k model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model is specifically adapted for tasks related to the gsm8k dataset, indicating an optimization for mathematical reasoning and problem-solving. It leverages a 32768 token context length, making it suitable for processing extensive inputs in its specialized domain.
Loading preview...
Model Overview
The jahyungu/Qwen2.5-1.5B-Instruct_gsm8k is a 1.5 billion parameter instruction-tuned model, building upon the base Qwen/Qwen2.5-1.5B-Instruct architecture. This particular iteration has undergone further fine-tuning, with its name suggesting a specialization for tasks related to the GSM8K dataset, which typically involves grade school mathematical word problems. This fine-tuning implies an enhanced capability in numerical reasoning and problem-solving.
Key Characteristics
- Base Model: Qwen2.5-1.5B-Instruct
- Parameter Count: 1.5 billion parameters
- Context Length: Supports a substantial context window of 32768 tokens.
- Specialization: Fine-tuned for tasks related to the GSM8K dataset, indicating a focus on mathematical reasoning.
Training Details
The model was trained with a learning rate of 1e-05, a batch size of 2 (accumulated to 16), and utilized the AdamW optimizer. A cosine learning rate scheduler with a 0.03 warmup ratio was employed over 3 epochs. The training environment included Transformers 4.50.0, Pytorch 2.6.0+cu124, Datasets 3.4.1, and Tokenizers 0.21.0.
Intended Use Cases
This model is likely best suited for applications requiring strong performance in mathematical problem-solving, particularly those aligned with the complexity of the GSM8K dataset. Its instruction-tuned nature makes it responsive to prompts for specific tasks within this domain.