Name: arun-ghontale/cppo-g16-p0875 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: arun-ghontale

Model Overview

arun-ghontale/cppo-g16-p0875 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by arun-ghontale using the TRL framework and incorporates the GRPO (Gradient-based Reward Policy Optimization) training method. This method, introduced in the DeepSeekMath paper, aims to push the limits of mathematical reasoning in open language models.

Key Capabilities

Enhanced Mathematical Reasoning: Trained with the GRPO method, suggesting improved performance on tasks requiring mathematical and logical problem-solving.
Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively.
Qwen2.5 Base: Benefits from the robust architecture and capabilities of the Qwen2.5-1.5B-Instruct model, including a 32K context length.

Good For

Applications requiring a compact model with enhanced mathematical reasoning.
Tasks where instruction following and logical problem-solving are critical.
Experimentation with models fine-tuned using advanced reinforcement learning techniques like GRPO.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)