Name: cjiao/goldengoose-high_div_rand_polar-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Overview

cjiao/goldengoose-high_div_rand_polar-25grp is an instruction-tuned language model built upon the Qwen2.5-1.5B-Instruct base model. It was developed by cjiao and fine-tuned using the TRL (Transformer Reinforcement Learning) library.

Key Training Details

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Training Method: Utilizes GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This suggests an emphasis on improving reasoning capabilities, potentially in mathematical or logical domains.
Frameworks: Trained with TRL 0.19.1, Transformers 4.57.6, Pytorch 2.5.1, Datasets 4.8.4, and Tokenizers 0.22.2.

Capabilities

Instruction Following: Inherits instruction-following capabilities from its Qwen2.5-1.5B-Instruct base.
Enhanced Reasoning: The application of the GRPO method indicates a focus on improving reasoning, particularly as it stems from research in mathematical reasoning.
Diverse Response Generation: The model's name, "high_div_rand_polar," suggests an optimization for generating highly diverse and randomized outputs, potentially with a focus on contrasting or polarized perspectives.

Good For

Applications requiring varied and non-deterministic text generation.
Tasks where nuanced or contrasting viewpoints are beneficial.
Exploratory text generation and creative writing where diverse outputs are desired.
Use cases that could benefit from improved reasoning, especially if related to mathematical or logical problem-solving, given the GRPO method's origin.

Overview

Overview

Key Training Details

Capabilities

Good For

Full Model Card (README)