Name: cjiao/goldengoose-gumbel_combined_gmrel_tau1.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

This model, goldengoose-gumbel_combined_gmrel_tau1.00-25grp, is a 1.5 billion parameter language model developed by cjiao. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, leveraging the TRL (Transformer Reinforcement Learning) framework for its training.

Key Training Methodology

A significant differentiator for this model is its training procedure, which incorporates GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" and is designed to enhance a model's mathematical reasoning abilities.

Capabilities and Use Cases

Given its foundation in Qwen2.5-1.5B-Instruct and the application of GRPO, this model is particularly suited for:

Reasoning-intensive tasks: Benefiting from the GRPO training, it is expected to perform well in scenarios requiring logical deduction and problem-solving.
Instruction following: Inheriting capabilities from its instruction-tuned base model.
General text generation: For tasks where a compact yet capable model with enhanced reasoning is beneficial.

Technical Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Parameters: 1.5 Billion
Context Length: 32768 tokens
Training Framework: TRL (version 0.19.1)
Core Training Method: GRPO, as detailed in the DeepSeekMath paper.

Overview

Model Overview

Key Training Methodology

Capabilities and Use Cases

Technical Details

Full Model Card (README)