Name: cjiao/goldengoose-gumbel_combined_grpoc_tau0.50-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_grpoc_tau0.50-25grp is a 1.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-1.5B-Instruct architecture. It supports a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Training Details

This model's distinctiveness stems from its training procedure, which incorporates the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. GRPO was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization focus on enhancing the model's reasoning capabilities, particularly in complex or mathematical contexts.

Training was conducted using the TRL library, a framework for Transformer Reinforcement Learning, indicating a fine-tuning approach that likely leverages reinforcement learning from human feedback or similar techniques to refine its responses.

Potential Use Cases

Given its fine-tuning with GRPO, this model is likely well-suited for applications requiring:

Advanced reasoning tasks
Mathematical problem-solving
Complex question answering
Generating coherent and logically structured text

Overview

Model Overview

Key Training Details

Potential Use Cases

Full Model Card (README)