Name: cjiao/goldengoose-low_div_rand_polar-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

cjiao/goldengoose-low_div_rand_polar-25grp is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It leverages a substantial 32768-token context length, making it suitable for processing longer inputs and maintaining context over extended interactions.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gradient-based Reward Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training approach aims to significantly enhance the model's capabilities in mathematical reasoning and problem-solving.

Use Cases

Mathematical Reasoning: Ideal for applications requiring the model to understand, process, and generate responses related to mathematical problems or logical deductions.
Instruction Following: As an instruction-tuned model, it is designed to follow user prompts effectively across various tasks.
Long Context Processing: Its 32768-token context window allows for handling complex queries or documents that require extensive contextual understanding.

Technical Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, with specific versions of libraries including TRL 0.19.1 and Transformers 4.57.6. The GRPO method, as detailed in the DeepSeekMath paper, is central to its specialized performance.

Overview

Model Overview

Key Differentiator: GRPO Training

Use Cases

Technical Details

Full Model Card (README)