Name: narcolepticchicken/occ-grpo-occ API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: narcolepticchicken

Overview

narcolepticchicken/occ-grpo-occ is a 3.1 billion parameter instruction-tuned language model, built upon the robust Qwen/Qwen2.5-3B-Instruct architecture. This model distinguishes itself through its specialized training methodology, employing GRPO (Gradient-based Reward Policy Optimization). The GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to enhance reasoning capabilities, particularly in complex domains.

Key Capabilities

Enhanced Reasoning: Fine-tuned with GRPO, suggesting an optimization for tasks that benefit from improved logical and analytical processing.
Instruction Following: As an instruction-tuned model, it is designed to accurately interpret and execute user prompts.
Qwen2.5 Base: Benefits from the strong foundational capabilities of the Qwen2.5-3B-Instruct model, including a 32768-token context length.

Training Details

The model was trained using the TRL (Transformers Reinforcement Learning) framework, specifically version 1.7.0. The application of GRPO indicates a focus on refining the model's policy based on reward signals, a technique often used to improve performance in specific, challenging tasks like mathematical reasoning.

Good For

Applications requiring a compact yet capable model for reasoning-intensive tasks.
Scenarios where the base Qwen2.5-3B-Instruct model's performance needs a boost in logical coherence or problem-solving.
Developers interested in exploring models fine-tuned with advanced reinforcement learning techniques like GRPO.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)