cjiao/goldengoose-gumbel_combined_indoc_tau1.00-25grp
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 29, 2026Architecture:Transformer Warm
The cjiao/goldengoose-gumbel_combined_indoc_tau1.00-25grp is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen2.5 architecture.
Loading preview...
Model Overview
The cjiao/goldengoose-gumbel_combined_indoc_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a context length of 32768 tokens, making it suitable for processing longer inputs.
Key Capabilities
- Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to improve the model's ability to handle mathematical and logical reasoning tasks.
- Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts and generate relevant responses.
- TRL Framework: The fine-tuning process was conducted using the TRL (Transformer Reinforcement Learning) library, indicating a focus on reinforcement learning techniques for performance optimization.
When to Use This Model
- Mathematical Problem Solving: Ideal for applications requiring improved performance on mathematical reasoning and problem-solving, given its GRPO training.
- Instruction-Based Tasks: Suitable for general instruction-following tasks where a 1.5B parameter model with a large context window is appropriate.
- Research and Experimentation: Useful for researchers exploring the impact of GRPO and similar reinforcement learning techniques on smaller language models.