Name: cjiao/goldengoose-gumbel_combined_indoc_tau2.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

cjiao/goldengoose-gumbel_combined_indoc_tau2.00-25grp is a 1.5 billion parameter instruction-tuned model, building upon the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been fine-tuned using the TRL library.

Key Training Methodology

A distinguishing feature of this model is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to improve a model's mathematical reasoning abilities. The training process was tracked and can be visualized via Weights & Biases.

Capabilities and Use Cases

Given its fine-tuning with GRPO, this model is particularly well-suited for:

Mathematical Reasoning Tasks: Excelling in problems that require logical and mathematical deduction.
Instruction Following: As an instruction-tuned model, it can effectively follow user prompts for various tasks.
General Language Generation: Capable of generating coherent and contextually relevant text, leveraging its base Qwen2.5 architecture.

This model offers a compact yet capable solution for applications where enhanced reasoning, especially in quantitative domains, is beneficial.

Overview

Model Overview

Key Training Methodology

Capabilities and Use Cases

Full Model Card (README)