SantiagoC/palindrome-curriculum-v1
SantiagoC/palindrome-curriculum-v1 is a 0.8 billion parameter causal language model fine-tuned from SantiagoC/palindrome-sft-qwen3. Developed by SantiagoC, this model was trained using the TRL framework and incorporates the GRPO method for enhanced mathematical reasoning. It is specifically optimized for tasks requiring advanced mathematical problem-solving capabilities.
Loading preview...
Model Overview
SantiagoC/palindrome-curriculum-v1 is a 0.8 billion parameter causal language model, fine-tuned by SantiagoC from the SantiagoC/palindrome-sft-qwen3 base model. This iteration leverages the TRL framework for its training process.
Key Training Methodology
A significant aspect of this model's development is the integration of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", aims to enhance the model's mathematical reasoning abilities. The training was facilitated by ML Intern, an agent for machine learning research and development.
Intended Use Cases
Given its fine-tuning with the GRPO method, this model is particularly suited for:
- Mathematical reasoning tasks: Excelling in problems that require logical and mathematical deduction.
- Complex problem-solving: Applications where structured, step-by-step reasoning is beneficial.
Technical Details
- Base Model: SantiagoC/palindrome-sft-qwen3
- Training Framework: TRL (Transformers Reinforcement Learning)
- Parameter Count: 0.8 billion
- Context Length: 32768 tokens
This model provides a foundation for applications demanding robust mathematical and logical processing.