cjiao/goldengoose-gumbel_tau0.10-25grp
The cjiao/goldengoose-gumbel_tau0.10-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model utilizes the GRPO method for training, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced reasoning, leveraging a 32K context length. This model is particularly suited for applications demanding robust logical and mathematical problem-solving.
Loading preview...
Model Overview
The cjiao/goldengoose-gumbel_tau0.10-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the Qwen/Qwen2.5-1.5B-Instruct architecture. It was fine-tuned using the TRL framework and incorporates the GRPO (Gumbel-softmax Relaxed Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its reasoning capabilities.
Key Capabilities
- Enhanced Mathematical Reasoning: Leverages the GRPO training method, specifically designed to improve performance on mathematical and logical reasoning tasks.
- Instruction Following: As an instruction-tuned model, it is capable of understanding and executing a wide range of user prompts.
- Context Length: Supports a substantial context window of 32,768 tokens, allowing for processing longer inputs and maintaining conversational coherence.
Training Details
The model's training procedure involved the GRPO method, which is detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This approach aims to push the boundaries of mathematical reasoning in open language models.
Good For
- Applications requiring strong mathematical problem-solving.
- Tasks that benefit from advanced logical reasoning.
- Scenarios where a compact yet capable model with good instruction-following is needed.