Name: cjiao/goldengoose-gumbel_combined_random_seed3-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-gumbel_combined_random_seed3-25grp is a 1.5 billion parameter instruction-tuned language model, building upon the Qwen/Qwen2.5-1.5B-Instruct base model. It features a context length of 32768 tokens, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was fine-tuned using GRPO (Gumbel-Softmax Policy Optimization), a method highlighted in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This training approach aims to enhance the model's reasoning capabilities, particularly in areas that benefit from structured thought processes.

Capabilities

Instruction Following: Fine-tuned to respond effectively to user instructions and questions.
Text Generation: Capable of generating coherent and contextually relevant text based on prompts.
Reasoning Focus: Benefits from GRPO training, which can contribute to improved logical and mathematical reasoning in its outputs.

Use Cases

This model is well-suited for applications requiring:

General-purpose conversational AI.
Question answering systems.
Text summarization and generation tasks where improved reasoning might be beneficial.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, with specific versions including TRL 0.19.1 and Transformers 4.57.6. Further details on the training run are available via Weights & Biases.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities

Use Cases

Training Details

Full Model Card (README)