cjiao/goldengoose-low_div_rand-25grp
The cjiao/goldengoose-low_div_rand-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the TRL library and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, particularly in mathematical contexts, leveraging its specialized training approach.
Loading preview...
Model Overview
cjiao/goldengoose-low_div_rand-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. This model distinguishes itself through its specialized training procedure, which utilizes the TRL (Transformer Reinforcement Learning) library.
Key Differentiator: GRPO Training
A core aspect of this model's development is its training with GRPO (Gradient-based Reward Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on enhancing the model's capabilities in mathematical reasoning and problem-solving.
Technical Details
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Training Framework: TRL (version 0.19.1)
- Parameter Count: 1.5 billion
- Context Length: 32768 tokens
Intended Use Cases
Given its GRPO-based training, this model is particularly well-suited for applications requiring:
- Mathematical Reasoning: Solving complex math problems and understanding mathematical concepts.
- Logical Deduction: Tasks that benefit from structured reasoning and problem-solving approaches.
- Instruction Following: General instruction-tuned capabilities inherited from its base model, with an emphasis on analytical tasks.