SantiagoC/palindrome-curriculum-v2
SantiagoC/palindrome-curriculum-v2 is a 0.8 billion parameter language model fine-tuned from SantiagoC/palindrome-sft-qwen3. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its specialized training approach.
Loading preview...
Model Overview
SantiagoC/palindrome-curriculum-v2 is a 0.8 billion parameter language model developed by SantiagoC. It is a fine-tuned iteration of the SantiagoC/palindrome-sft-qwen3 base model, specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method. This training approach, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," focuses on improving mathematical reasoning.
Key Characteristics
- Base Model: Fine-tuned from
SantiagoC/palindrome-sft-qwen3. - Training Method: Utilizes GRPO, a technique aimed at enhancing reasoning abilities, particularly in mathematical domains.
- Context Length: Supports a context length of 32768 tokens.
Intended Use
This model is suitable for applications requiring strong reasoning capabilities, especially those involving mathematical problem-solving or logical deduction, due to its specialized GRPO training.