cjiao/goldengoose-gumbel_gmrel_tau0.10-25grp
The cjiao/goldengoose-gumbel_gmrel_tau0.10-25grp model is a 1.5 billion parameter instruction-tuned language model, fine-tuned by cjiao from the Qwen/Qwen2.5-1.5B-Instruct base. It was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning capabilities. This model is optimized for tasks requiring improved reasoning, particularly in mathematical contexts, leveraging its specialized training approach.
Loading preview...
Model Overview
The cjiao/goldengoose-gumbel_gmrel_tau0.10-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its development utilized the TRL framework and incorporated the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) training method.
Key Capabilities
- Enhanced Reasoning: The model's training with GRPO, a method detailed in the DeepSeekMath paper, suggests a focus on improving reasoning abilities, particularly in mathematical domains.
- Instruction Following: As a fine-tuned instruction model, it is designed to respond effectively to user prompts and instructions.
Training Details
This model was trained using the TRL library (version 0.19.1) and other standard deep learning frameworks including Transformers (4.57.6), Pytorch (2.5.1), Datasets (4.8.4), and Tokenizers (0.22.2). The GRPO method, originating from research on mathematical reasoning in large language models, was central to its fine-tuning process.