Name: cjiao/goldengoose-gumbel_combined_gmrel_tau0.10-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_gmrel_tau0.10-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the Qwen/Qwen2.5-1.5B-Instruct base. Developed by cjiao, this model leverages the TRL (Transformer Reinforcement Learning) library for its fine-tuning process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was trained using GRPO (Gumbel-Softmax Policy Optimization), a method highlighted in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). While the base model is general-purpose, the application of GRPO suggests an optimization for tasks that may benefit from enhanced reasoning or structured output generation, similar to its use in mathematical reasoning contexts.

Capabilities & Use Cases

This model is suitable for various text generation tasks, including answering questions, creative writing, and conversational AI, given its instruction-tuned nature. Its 32768-token context length allows for processing and generating longer sequences of text. The GRPO training could potentially make it more robust in tasks requiring logical coherence or adherence to specific patterns, distinguishing it from standard instruction-tuned models of similar size.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities & Use Cases

Full Model Card (README)