Name: cjiao/goldengoose-gumbel_combined_grpoc_tau1.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

This model, goldengoose-gumbel_combined_grpoc_tau1.00-25grp, is a 1.5 billion parameter instruction-tuned language model. It is a fine-tuned version of the Qwen/Qwen2.5-1.5B-Instruct base model, developed by cjiao.

Key Differentiator: GRPO Training

The model was trained using GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization), a method first introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach aims to improve the model's reasoning and generation capabilities, potentially extending beyond its original mathematical focus to general instruction following.

Training Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Training Framework: TRL (Transformer Reinforcement Learning) version 0.19.1
Methodology: GRPO, as detailed in the DeepSeekMath paper.

Use Cases

This model is suitable for various text generation tasks, particularly those requiring instruction following and coherent responses. Its fine-tuning with GRPO suggests potential strengths in tasks that benefit from enhanced reasoning, making it a candidate for applications where robust and logical outputs are desired.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Details

Use Cases

Full Model Card (README)