rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4_merged
The rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4_merged model is a 1.5 billion parameter causal language model. It is fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B specifically on the GSM8K dataset using the GRPO method. This model is optimized for mathematical reasoning and problem-solving tasks, particularly those found in the GSM8K benchmark. Its primary use case is to provide enhanced performance on arithmetic and word problems.
Loading preview...
Model Overview
This model, rghosh8/gsm8k-deepseek-r1-distill-qwen-1.5b-rajat-seed-42-G-4_merged, is a 1.5 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. It has been specifically fine-tuned on the GSM8K dataset using the GRPO (Gradient-based Reward Policy Optimization) method.
Key Capabilities
- Mathematical Reasoning: Optimized for solving grade school level mathematical word problems and arithmetic tasks.
- Distilled Architecture: Leverages a distilled Qwen 1.5B architecture, suggesting efficiency for its parameter count.
- Fine-tuned Performance: The fine-tuning process on GSM8K aims to enhance its accuracy and reasoning abilities for quantitative problems.
Use Cases
This model is particularly well-suited for applications requiring:
- Solving arithmetic and mathematical word problems.
- Educational tools for mathematics.
- Benchmarking performance on quantitative reasoning tasks, especially those similar to GSM8K.
Its relatively small size (1.5B parameters) combined with specialized fine-tuning makes it a candidate for efficient deployment in scenarios focused on mathematical problem-solving.