Name: cjiao/goldengoose-gumbel-2.00-100 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel-2.00-100 is a 1.5 billion parameter instruction-tuned language model, building upon the foundation of Qwen/Qwen2.5-1.5B-Instruct. This model was developed by cjiao and fine-tuned using the TRL library.

Key Training Details

A significant aspect of this model's development is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", aims to enhance the model's mathematical reasoning abilities. The training process was tracked and can be visualized via Weights & Biases.

Intended Use Cases

Given its fine-tuning with the GRPO method, this model is particularly well-suited for:

Mathematical reasoning tasks: Leveraging the GRPO training for improved performance in complex calculations and logical deductions.
Instruction-following: As an instruction-tuned model, it can process and respond to user prompts effectively.

This model offers a compact yet capable option for applications requiring enhanced reasoning, especially within mathematical domains, building on the robust Qwen2.5 architecture.

Overview

Model Overview

Key Training Details

Intended Use Cases

Full Model Card (README)