Overview
ryzax/1.5B-v18 is a 2 billion parameter language model developed by ryzax. It is a fine-tuned iteration of the ryzax/qwen3_1.7B_sft_correct_v3_1e-5_4 base model, specifically trained on the agentica-org/DeepScaleR-Preview-Dataset.
Key Capabilities
- Enhanced Mathematical Reasoning: This model leverages the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to significantly improve its mathematical reasoning abilities.
- Fine-tuned Performance: Built upon a strong base model, it has undergone further supervised fine-tuning to refine its responses and capabilities.
- TRL Framework: Training was conducted using the TRL (Transformer Reinforcement Learning) library, indicating a focus on optimizing model behavior through reinforcement learning techniques.
Training Details
The model's training procedure specifically incorporated GRPO, a technique designed to push the limits of mathematical reasoning in open language models. This method is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks benefiting from models trained with advanced reinforcement learning techniques like GRPO.
- Developers looking for a compact model with specialized reasoning capabilities.