rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4_merged

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 1, 2026Architecture:Transformer Cold

The rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4_merged model is a 1.5 billion parameter causal language model. It is fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B specifically on the GSM8K dataset using the GRPO method. This model is optimized for mathematical reasoning and problem-solving tasks, particularly those found in the GSM8K benchmark. Its primary use case is to provide enhanced performance on arithmetic and word problems.

Loading preview...

Model Overview

This model, rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4_merged, is a 1.5 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. It has been specifically fine-tuned on the GSM8K dataset using the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

  • Mathematical Reasoning: Optimized for solving grade school level mathematical word problems and arithmetic tasks.
  • Distilled Architecture: Leverages a distilled Qwen 1.5B architecture, suggesting efficiency for its parameter count.
  • Fine-tuned Performance: The fine-tuning process on GSM8K aims to enhance its accuracy and reasoning abilities for quantitative problems.

Use Cases

This model is particularly well-suited for applications requiring:

  • Solving arithmetic and mathematical word problems.
  • Educational tools for mathematics.
  • Benchmarking performance on quantitative reasoning tasks, especially those similar to GSM8K.

Its relatively small size (1.5B parameters) combined with specialized fine-tuning makes it a candidate for efficient deployment in scenarios focused on mathematical problem-solving.