rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-16_merged
The rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-16_merged model is a 1.5 billion parameter language model, fine-tuned from DeepSeek-R1-Distill-Qwen-1.5B. It was specifically optimized for mathematical reasoning tasks using the GSM8K dataset and the GRPO method. This model is designed to enhance performance on grade school math problems, offering a specialized solution for numerical and logical problem-solving within its 32768 token context length.
Loading preview...
Model Overview
This model, rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-16_merged, is a 1.5 billion parameter language model built upon the DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been specifically fine-tuned to excel in mathematical reasoning tasks, particularly those found in the GSM8K dataset.
Key Capabilities
- Mathematical Reasoning: Optimized for solving grade school math problems, making it suitable for applications requiring numerical and logical problem-solving.
- Distilled Architecture: Benefits from a distillation process, potentially offering efficient performance for its size.
- Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer problem descriptions or multi-step reasoning.
Training Details
The model underwent fine-tuning on the GSM8K dataset using the GRPO (Gradient-based Reward Policy Optimization) method. This targeted training approach aims to improve its accuracy and reasoning capabilities in arithmetic and word problems.
Good For
- Applications requiring robust performance on mathematical word problems.
- Educational tools focused on grade school mathematics.
- Research into efficient mathematical reasoning in smaller language models.