Name: cjiao/goldengoose-gumbel_combined_random-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_random-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a substantial 32,768 token context length, making it suitable for processing longer inputs and generating more extensive responses.

Key Capabilities

Enhanced Reasoning: This model was specifically trained using the GRPO (Gumbel-softmax Reinforced Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to push the limits of mathematical reasoning in open language models.
Instruction Following: As a fine-tuned instruction model, it is designed to understand and execute user prompts effectively, generating relevant and coherent text based on given instructions.

Training Details

The model's fine-tuning process utilized the TRL (Transformer Reinforcement Learning) library. The application of the GRPO method suggests a focus on improving the model's ability to handle complex logical and mathematical problems, differentiating it from standard instruction-tuned models.

Use Cases

This model is particularly well-suited for applications where robust reasoning, especially in mathematical or logical domains, is crucial. Its instruction-following capabilities also make it versatile for general text generation tasks where clarity and adherence to prompts are important.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)