Name: cjiao/goldengoose-gumbel_combined_gmrel_tau0.50-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Overview

This model, cjiao/goldengoose-gumbel_combined_gmrel_tau0.50-25grp, is a 1.5 billion parameter language model derived from the Qwen2.5-1.5B-Instruct base model. It has been specifically fine-tuned using the TRL library and incorporates the GRPO (Gumbel-Softmax Reinforcement Learning with Policy Optimization) training method.

Key Capabilities

Enhanced Reasoning: The primary differentiator of this model is its training with GRPO, a method introduced in the DeepSeekMath paper. This suggests an optimization for complex reasoning tasks, particularly in mathematical domains.
Instruction Following: As it is fine-tuned from an instruction-tuned base model (Qwen2.5-1.5B-Instruct), it retains strong capabilities in following user instructions.
Efficient Size: With 1.5 billion parameters, it offers a balance between performance and computational efficiency, making it suitable for deployment in resource-constrained environments.

Training Details

The model's training leveraged the TRL (Transformer Reinforcement Learning) framework, version 0.19.1, and PyTorch 2.5.1. The GRPO method, as detailed in the DeepSeekMath research, was central to its fine-tuning process.

Good For

Applications requiring robust mathematical reasoning.
Tasks where a smaller, efficient model with enhanced reasoning capabilities is preferred.
Instruction-following tasks where the base Qwen2.5-1.5B-Instruct model's performance is desired, with potential improvements in reasoning.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)