Name: cjiao/goldengoose-top25_gradsim-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

The cjiao/goldengoose-top25_gradsim-25grp is a 1.5 billion parameter instruction-tuned language model, developed by cjiao. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, leveraging the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Capabilities

Enhanced Reasoning: This model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the DeepSeekMath paper. This training approach aims to significantly improve the model's mathematical and logical reasoning abilities.
Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
Efficient Performance: With 1.5 billion parameters and a 32K context length, it offers a balance between performance and computational efficiency, making it suitable for various applications where larger models might be overkill.

Training Details

The model's training utilized the TRL library (version 0.19.1) and was conducted with PyTorch 2.5.1. The GRPO method, central to its fine-tuning, is designed to push the boundaries of mathematical reasoning in open language models.

When to Use This Model

This model is a strong candidate for applications requiring:

Mathematical Problem Solving: Its GRPO-based training makes it particularly adept at tasks involving mathematical reasoning.
Logical Deduction: The fine-tuning process aims to improve its ability to handle complex logical queries.
Instruction-based Generation: For scenarios where precise adherence to instructions is crucial, building on its Qwen2.5-Instruct foundation.

Overview

Model Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)