nilarnabdebnath/Qwen2.5-1.5B-Instruct_gsm8k

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 21, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The nilarnabdebnath/Qwen2.5-1.5B-Instruct_gsm8k model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It is specifically optimized for mathematical reasoning tasks, indicated by its fine-tuning on a GSM8K-related dataset. This model is designed for applications requiring robust problem-solving capabilities in quantitative domains, offering a compact yet capable solution.

Loading preview...

Overview

This model, nilarnabdebnath/Qwen2.5-1.5B-Instruct_gsm8k, is a 1.5 billion parameter instruction-tuned language model. It is a specialized fine-tuned version of the base Qwen/Qwen2.5-1.5B-Instruct model. The fine-tuning process involved an unspecified dataset, but the model's naming convention strongly suggests an optimization for the GSM8K (Grade School Math 8K) benchmark, indicating a focus on mathematical reasoning and problem-solving.

Key Characteristics

  • Base Model: Qwen2.5-1.5B-Instruct
  • Parameter Count: 1.5 billion
  • Context Length: 32768 tokens
  • Fine-tuning Objective: Implied optimization for mathematical reasoning tasks, likely using a GSM8K-related dataset.

Training Details

The model was trained with a learning rate of 1e-05, a batch size of 2 (accumulated to 16), and utilized the AdamW optimizer. A cosine learning rate scheduler with a 0.03 warmup ratio was applied over 3 epochs. The training was conducted using Transformers 4.50.0 and PyTorch 2.6.0+cu124.

Intended Use Cases

This model is particularly suited for applications requiring:

  • Solving grade-school level mathematical word problems.
  • Reasoning tasks that involve numerical and logical deduction.
  • Integration into systems where a compact model with strong mathematical capabilities is beneficial.