Name: narcolepticchicken/occ-grpo-baseline API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: narcolepticchicken

Model Overview

The narcolepticchicken/occ-grpo-baseline is an instruction-tuned language model based on the Qwen/Qwen2.5-3B-Instruct architecture. It has been fine-tuned using the TRL (Transformers Reinforcement Learning) framework.

Key Capabilities

Enhanced Mathematical Reasoning: This model's primary differentiator is its training with the GRPO (Gradient-based Reward Policy Optimization) method. GRPO, introduced in the "DeepSeekMath" paper, aims to push the limits of mathematical reasoning in open language models.
Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively, similar to its base model, Qwen2.5-3B-Instruct.

Training Details

The model was trained using specific versions of popular frameworks:

TRL: 1.7.0
Transformers: 5.12.1
Pytorch: 2.12.1
Datasets: 5.0.0
Tokenizers: 0.22.2

Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving and reasoning tasks, benefiting from its specialized GRPO training. Developers can integrate it using the Hugging Face pipeline for text generation.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)