Name: cjiao/goldengoose-gumbel_combined_indoc_tau0.10-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_indoc_tau0.10-25grp is a 1.5 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-1.5B-Instruct model. It leverages a 32,768 token context length, making it suitable for processing longer inputs.

Key Training Methodology

This model was trained using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method. GRPO was initially presented in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggesting a focus on improving reasoning capabilities, potentially in mathematical or logical domains. The fine-tuning process was implemented using the TRL library.

Potential Use Cases

Reasoning-intensive tasks: Given its GRPO training, the model may excel in tasks requiring structured reasoning.
Instruction following: As it's fine-tuned from an instruct model, it's designed to follow user instructions effectively.
Applications requiring longer context: The 32K context window supports more extensive conversational or document-based interactions.

Overview

Model Overview

Key Training Methodology

Potential Use Cases

Full Model Card (README)