rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-16_merged

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 1, 2026Architecture:Transformer Warm

The rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-16_merged model is a 1.5 billion parameter language model, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. It was specifically optimized for mathematical reasoning tasks using the GSM8K dataset and the GRPO method. This model is designed to enhance performance on grade school math problems, offering a specialized solution for numerical and logical problem-solving within its 32768 token context length.

Loading preview...

Model Overview

This model, rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-16_merged, is a 1.5 billion parameter language model built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been specifically fine-tuned to excel in mathematical reasoning tasks, particularly those found in the GSM8K dataset.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving grade school math problems, making it suitable for applications requiring numerical and logical problem-solving.
  • Distilled Architecture: Benefits from a distillation process, potentially offering efficient performance for its size.
  • Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer problem descriptions or multi-step reasoning.

Training Details

The model underwent fine-tuning on the GSM8K dataset using the GRPO (Gradient-based Reward Policy Optimization) method. This targeted training approach aims to improve its accuracy and reasoning capabilities in arithmetic and word problems.

Good For

  • Applications requiring robust performance on mathematical word problems.
  • Educational tools focused on grade school mathematics.
  • Research into efficient mathematical reasoning in smaller language models.