Name: cjiao/goldengoose-high_div_rand_weighted-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-high_div_rand_weighted-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the base architecture of Qwen/Qwen2.5-1.5B-Instruct. This model was developed by cjiao and fine-tuned using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This suggests an optimization towards enhanced reasoning capabilities, potentially making it more robust for tasks requiring logical thought processes.

Capabilities and Use Cases

This model is suitable for a variety of text generation tasks, leveraging its instruction-tuned nature and the benefits of GRPO training. Its 1.5 billion parameters and 32768-token context length make it a capable option for applications where a smaller, efficient model with improved reasoning is desired. Developers can use it for tasks such as answering complex questions, generating creative text, or engaging in conversational AI, especially where the underlying reasoning quality is important.

Technical Details

The model was trained with specific versions of key frameworks, including TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2. Further details on the training process can be explored via the associated Weights & Biases run.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities and Use Cases

Technical Details

Full Model Card (README)