Name: cjiao/golden-goose-qwen2.5-1.5b-instruct-random API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/golden-goose-qwen2.5-1.5b-instruct-random is a 1.5 billion parameter instruction-tuned language model, building upon the base of Qwen/Qwen2.5-1.5B-Instruct. This model was fine-tuned using the TRL framework, a library for Transformer Reinforcement Learning.

Key Differentiator: GRPO Training

A significant aspect of this model's training is the application of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to enhance a model's mathematical reasoning abilities. By incorporating GRPO, golden-goose-qwen2.5-1.5b-instruct-random aims to improve performance on complex reasoning tasks.

Capabilities & Use Cases

Enhanced Mathematical Reasoning: Due to its GRPO-based training, this model is particularly suited for tasks that involve mathematical problem-solving and logical deduction.
Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
Long Context Processing: With a context length of 32768 tokens, it can handle and process extensive inputs, beneficial for detailed queries or multi-turn conversations.

Training Details

The model's training procedure utilized specific versions of popular frameworks:

TRL: 1.1.0
Transformers: 4.57.6
Pytorch: 2.10.0

This model is a strong candidate for applications requiring a compact yet capable model with a focus on improved reasoning, especially in quantitative domains.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities & Use Cases

Training Details

Full Model Card (README)