Name: cjiao/goldengoose-gumbel_combined_gradsim_tau0.50-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Overview

This model, cjiao/goldengoose-gumbel_combined_gradsim_tau0.50-25grp, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

Enhanced Reasoning: The model's primary differentiator is its training with the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. This technique, detailed in the DeepSeekMath paper, is designed to push the limits of mathematical reasoning in language models.
Instruction Following: As a fine-tuned version of an instruct model, it is capable of following user instructions for various text generation tasks.

Training Details

The model was trained using TRL version 0.19.1, with Transformers 4.57.6 and PyTorch 2.5.1. The GRPO method, central to its training, aims to improve reasoning abilities, particularly in complex domains like mathematics. This specialized training distinguishes it from general-purpose instruction-tuned models by focusing on a more robust reasoning process.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)