Name: cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: cjiao

Model Overview

This model, cjiao/goldengoose-gumbel_gradsim_tau2.00-25grp, is a 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been specifically fine-tuned using the TRL library.

Key Training Details

A notable aspect of this model's development is its training with GRPO (Gumbel-softmax Reinforcement Learning with Policy Optimization). This method was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The application of GRPO suggests an optimization focus on improving reasoning and problem-solving abilities, particularly in mathematical contexts.

Technical Specifications

Base Model: Qwen/Qwen2.5-1.5B-Instruct
Parameter Count: 1.5 billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 0.19.1), Transformers (version 4.57.6), Pytorch (version 2.5.1), Datasets (version 4.8.4), Tokenizers (version 0.22.2)

Potential Use Cases

Given its fine-tuning approach, this model is likely well-suited for applications requiring:

Enhanced logical reasoning.
Mathematical problem-solving.
Instruction-following tasks where precise and structured outputs are beneficial.

Overview

Model Overview

Key Training Details

Technical Specifications

Potential Use Cases

Full Model Card (README)