arun-ghontale/cppo-g16-p0875

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 14, 2026Architecture:Transformer Warm

The arun-ghontale/cppo-g16-p0875 model is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by arun-ghontale, it leverages the GRPO method for training, which is designed to enhance mathematical reasoning capabilities. This model is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon its Qwen2.5 base with a 32K context length.

Loading preview...

Model Overview

arun-ghontale/cppo-g16-p0875 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. It was developed by arun-ghontale using the TRL framework and incorporates the GRPO (Gradient-based Reward Policy Optimization) training method. This method, introduced in the DeepSeekMath paper, aims to push the limits of mathematical reasoning in open language models.

Key Capabilities

  • Enhanced Mathematical Reasoning: Trained with the GRPO method, suggesting improved performance on tasks requiring mathematical and logical problem-solving.
  • Instruction Following: As a fine-tuned instruction model, it is designed to follow user prompts effectively.
  • Qwen2.5 Base: Benefits from the robust architecture and capabilities of the Qwen2.5-1.5B-Instruct model, including a 32K context length.

Good For

  • Applications requiring a compact model with enhanced mathematical reasoning.
  • Tasks where instruction following and logical problem-solving are critical.
  • Experimentation with models fine-tuned using advanced reinforcement learning techniques like GRPO.