Name: cjiao/goldengoose-gumbel_combined_grpoc_tau0.10-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

cjiao/goldengoose-gumbel_combined_grpoc_tau0.10-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct architecture. This model leverages the TRL (Transformer Reinforcement Learning) library for its training process.

Key Capabilities

Enhanced Reasoning: The model was specifically trained using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. This technique, introduced in the context of improving mathematical reasoning in large language models, suggests an emphasis on logical and problem-solving tasks.
Instruction Following: As a fine-tuned version of an instruction-tuned base model (Qwen2.5-1.5B-Instruct), it is designed to follow user instructions effectively.

Training Details

The training procedure utilized GRPO, a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using TRL version 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2.

Good For

Applications requiring improved mathematical or logical reasoning.
Tasks where robust instruction following is crucial.
Developers looking for a compact 1.5B parameter model with specialized reasoning enhancements.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)