hector-gr/RLCR-v4-ks-highcov-batch-cold-math

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kTool Calling:SupportedPublished:Mar 28, 2026Architecture:Transformer Cold

The hector-gr/RLCR-v4-ks-highcov-batch-cold-math model is a 7.6 billion parameter language model, fine-tuned from Qwen/Qwen2.5-7B. Developed by hector-gr, it utilizes the GRPO method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. With a context length of 32768 tokens, this model is optimized for complex mathematical tasks and reasoning challenges.

Loading preview...

Overview

This model, hector-gr/RLCR-v4-ks-highcov-batch-cold-math, is a 7.6 billion parameter language model fine-tuned from the Qwen/Qwen2.5-7B base model. It was developed by hector-gr using the TRL framework and incorporates the GRPO (Gradient-based Reward Policy Optimization) method. The GRPO method, detailed in the DeepSeekMath paper, is specifically designed to push the limits of mathematical reasoning in open language models.

Key Capabilities

  • Enhanced Mathematical Reasoning: Leverages the GRPO method for improved performance on mathematical tasks.
  • Large Context Window: Supports a context length of 32768 tokens, allowing for processing longer and more complex inputs.
  • Qwen2.5 Architecture: Benefits from the robust architecture of the Qwen2.5 series.

Training Details

The model's training procedure involved the TRL library (version 0.16.0.dev0) and utilized PyTorch 2.5.1. The application of GRPO suggests a focus on optimizing the model's ability to handle intricate mathematical problems and logical reasoning.

When to Use

This model is particularly well-suited for applications requiring strong mathematical problem-solving and reasoning abilities, especially where the DeepSeekMath approach to mathematical reasoning is beneficial.