Name: cjiao/goldengoose-gumbel_combined_indoc_tau1.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_indoc_tau1.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a context length of 32768 tokens, making it suitable for processing longer inputs.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to improve the model's ability to handle mathematical and logical reasoning tasks.
Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts and generate relevant responses.
TRL Framework: The fine-tuning process was conducted using the TRL (Transformer Reinforcement Learning) library, indicating a focus on reinforcement learning techniques for performance optimization.

When to Use This Model

Mathematical Problem Solving: Ideal for applications requiring improved performance on mathematical reasoning and problem-solving, given its GRPO training.
Instruction-Based Tasks: Suitable for general instruction-following tasks where a 1.5B parameter model with a large context window is appropriate.
Research and Experimentation: Useful for researchers exploring the impact of GRPO and similar reinforcement learning techniques on smaller language models.

Overview

Model Overview

Key Capabilities

When to Use This Model

Full Model Card (README)