Name: cjiao/goldengoose-divsweepv2_goose_n512_indorc_tau2.00_n7 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

This model, goldengoose-divsweepv2_goose_n512_indorc_tau2.00_n7, is a 1.5 billion parameter instruction-tuned variant based on the Qwen/Qwen2.5-1.5B-Instruct architecture. It was fine-tuned by cjiao using the TRL library.

Key Training Methodology

The distinguishing feature of this model is its training procedure, which incorporates GRPO (Gradient-based Reward Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", focuses on significantly improving mathematical reasoning abilities in language models. The training process was tracked and can be visualized via Weights & Biases.

Intended Use Cases

Given its specialized training with GRPO, this model is particularly well-suited for:

Mathematical problem-solving: Excelling in tasks that require complex calculations, logical deduction, and understanding of mathematical concepts.
Reasoning tasks: Applications where robust logical inference and structured thinking are paramount.
Instruction following: Benefiting from its instruction-tuned base, it can accurately follow user prompts for specific tasks.

Overview

Model Overview

Key Training Methodology

Intended Use Cases

Full Model Card (README)