Overview

This model, goldengoose-divsweep_goose_n128_indorc_tau0.50-25grp, is a 1.5 billion parameter instruction-tuned variant of the Qwen2.5-1.5B-Instruct base model. It has been specifically fine-tuned using the TRL framework, incorporating the GRPO (Guided Reasoning Policy Optimization) method. GRPO is a training technique introduced in the context of DeepSeekMath, aiming to significantly improve a model's mathematical reasoning abilities.

Key Capabilities

Enhanced Mathematical Reasoning: Leverages the GRPO training method to boost performance on mathematical and logical tasks.
Instruction Following: Built upon an instruction-tuned base model, it is designed to follow user prompts effectively.
Context Handling: Supports a substantial context length of 32,768 tokens, allowing for processing of complex and lengthy inputs.

Training Details

The model's training procedure utilized GRPO, a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on developing strong analytical and problem-solving skills. The fine-tuning was performed using the TRL library (Transformer Reinforcement Learning).

Good For

Applications requiring strong mathematical problem-solving.
Tasks that benefit from enhanced logical reasoning.
Scenarios where detailed instruction following and longer context understanding are crucial.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)