cjiao/goldengoose-gumbel-0.50-100
The cjiao/goldengoose-gumbel-0.50-100 model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust logical and mathematical problem-solving, leveraging its 32768-token context length. It is particularly suited for applications where precise and structured reasoning is critical.
Loading preview...
Model Overview
The cjiao/goldengoose-gumbel-0.50-100 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. This model distinguishes itself through its training methodology, which incorporates GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The fine-tuning process was conducted using the TRL library.
Key Capabilities
- Enhanced Mathematical Reasoning: Leverages the GRPO training method to improve performance on tasks requiring logical and mathematical problem-solving.
- Instruction Following: Built upon an instruction-tuned base model, it is designed to follow user prompts effectively.
- Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer inputs and maintaining conversational coherence.
Good For
- Mathematical Problem Solving: Ideal for applications that involve complex calculations, logical deductions, or mathematical reasoning.
- Instruction-Based Tasks: Suitable for general instruction-following applications where a smaller, specialized model is preferred.
- Research and Development: Provides a foundation for further experimentation with GRPO-based fine-tuning techniques on Qwen2.5 models.