nilarnabdebnath/Qwen2.5-1.5B-Instruct_gsm8k
The nilarnabdebnath/Qwen2.5-1.5B-Instruct_gsm8k model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It is specifically optimized for mathematical reasoning tasks, indicated by its fine-tuning on a GSM8K-related dataset. This model is designed for applications requiring robust problem-solving capabilities in quantitative domains, offering a compact yet capable solution.
Loading preview...
Overview
This model, nilarnabdebnath/Qwen2.5-1.5B-Instruct_gsm8k, is a 1.5 billion parameter instruction-tuned language model. It is a specialized fine-tuned version of the base Qwen/Qwen2.5-1.5B-Instruct model. The fine-tuning process involved an unspecified dataset, but the model's naming convention strongly suggests an optimization for the GSM8K (Grade School Math 8K) benchmark, indicating a focus on mathematical reasoning and problem-solving.
Key Characteristics
- Base Model: Qwen2.5-1.5B-Instruct
- Parameter Count: 1.5 billion
- Context Length: 32768 tokens
- Fine-tuning Objective: Implied optimization for mathematical reasoning tasks, likely using a GSM8K-related dataset.
Training Details
The model was trained with a learning rate of 1e-05, a batch size of 2 (accumulated to 16), and utilized the AdamW optimizer. A cosine learning rate scheduler with a 0.03 warmup ratio was applied over 3 epochs. The training was conducted using Transformers 4.50.0 and PyTorch 2.6.0+cu124.
Intended Use Cases
This model is particularly suited for applications requiring:
- Solving grade-school level mathematical word problems.
- Reasoning tasks that involve numerical and logical deduction.
- Integration into systems where a compact model with strong mathematical capabilities is beneficial.