Name: cjiao/goldengoose-gumbel_combined_gradsim_tau2.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_gradsim_tau2.00-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and trained using the Transformer Reinforcement Learning (TRL) framework.

Key Capabilities & Training

This model's primary differentiator is its training methodology, which incorporates GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). GRPO is a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests the model is specifically enhanced for:

Mathematical Reasoning: The application of GRPO indicates a focus on improving the model's ability to understand and solve complex mathematical problems.
Instruction Following: As a fine-tuned version of an instruct model, it is designed to respond effectively to user prompts and instructions.

Technical Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Parameters: 1.5 billion
Context Length: 32768 tokens
Training Framework: TRL (Transformer Reinforcement Learning)
Key Training Method: GRPO, as detailed in the DeepSeekMath paper.

Potential Use Cases

Given its specialized training, this model is likely well-suited for applications requiring:

Solving mathematical problems or equations.
Generating explanations for mathematical concepts.
Reasoning-heavy tasks where logical deduction is crucial.
Instruction-based text generation in technical or analytical domains.

Overview

Model Overview

Key Capabilities & Training

Technical Details

Potential Use Cases

Full Model Card (README)