cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp
The cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by cjiao, it utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. This model is designed for tasks requiring improved logical and mathematical processing, offering a context length of 32768 tokens.
Loading preview...
Model Overview
This model, cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been specifically fine-tuned using the TRL library.
Key Training Details
A notable aspect of this model's development is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization focus on improving reasoning and problem-solving abilities, particularly in mathematical contexts.
Technical Specifications
- Base Model: Qwen/Qwen2.5-1.5B-Instruct
- Parameter Count: 1.5 billion
- Context Length: 32768 tokens
- Training Frameworks: TRL (version 0.19.1), Transformers (version 4.57.6), Pytorch (version 2.5.1), Datasets (version 4.8.4), Tokenizers (version 0.22.2)
Potential Use Cases
Given its fine-tuning approach, this model is likely well-suited for applications requiring:
- Enhanced logical reasoning.
- Mathematical problem-solving.
- Instruction-following tasks where precise and structured outputs are beneficial.