Name: cjiao/goldengoose-gumbel_gmrel_tau0.10-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_gmrel_tau0.10-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its development utilized the TRL framework and incorporated the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) training method.

Key Capabilities

Enhanced Reasoning: The model's training with GRPO, a method detailed in the DeepSeekMath paper, suggests a focus on improving reasoning abilities, particularly in mathematical domains.
Instruction Following: As a fine-tuned instruction model, it is designed to respond effectively to user prompts and instructions.

Training Details

This model was trained using the TRL library (version 0.19.1) and other standard deep learning frameworks including Transformers (4.57.6), Pytorch (2.5.1), Datasets (4.8.4), and Tokenizers (0.22.2). The GRPO method, originating from research on mathematical reasoning in large language models, was central to its fine-tuning process.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)