cjiao/goldengoose-gumbel_combined_gmrel_tau1.00-25grp
The cjiao/goldengoose-gumbel_combined_gmrel_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, this model utilizes the GRPO training method, which is known for enhancing mathematical reasoning in language models. It is optimized for tasks requiring robust reasoning capabilities, building upon its Qwen2.5 base architecture. The model supports a context length of 32768 tokens.
Loading preview...
Model Overview
This model, goldengoose-gumbel_combined_gmrel_tau1.00-25grp, is a 1.5 billion parameter language model developed by cjiao. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, leveraging the TRL (Transformer Reinforcement Learning) framework for its training.
Key Training Methodology
A significant differentiator for this model is its training procedure, which incorporates GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" and is designed to enhance a model's mathematical reasoning abilities.
Capabilities and Use Cases
Given its foundation in Qwen2.5-1.5B-Instruct and the application of GRPO, this model is particularly suited for:
- Reasoning-intensive tasks: Benefiting from the GRPO training, it is expected to perform well in scenarios requiring logical deduction and problem-solving.
- Instruction following: Inheriting capabilities from its instruction-tuned base model.
- General text generation: For tasks where a compact yet capable model with enhanced reasoning is beneficial.
Technical Details
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Parameters: 1.5 Billion
- Context Length: 32768 tokens
- Training Framework: TRL (version 0.19.1)
- Core Training Method: GRPO, as detailed in the DeepSeekMath paper.