Name: hector-gr/RLCR-v4-ks-highcov-batch-cold-math API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hector-gr

Overview

This model, hector-gr/RLCR-v4-ks-highcov-batch-cold-math, is a 7.6 billion parameter language model fine-tuned from the Qwen/Qwen2.5-7B base model. It was developed by hector-gr using the TRL framework and incorporates the GRPO (Gradient-based Reward Policy Optimization) method. The GRPO method, detailed in the DeepSeekMath paper, is specifically designed to push the limits of mathematical reasoning in open language models.

Key Capabilities

Enhanced Mathematical Reasoning: Leverages the GRPO method for improved performance on mathematical tasks.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing longer and more complex inputs.
Qwen2.5 Architecture: Benefits from the robust architecture of the Qwen2.5 series.

Training Details

The model's training procedure involved the TRL library (version 0.16.0.dev0) and utilized PyTorch 2.5.1. The application of GRPO suggests a focus on optimizing the model's ability to handle intricate mathematical problems and logical reasoning.

When to Use

This model is particularly well-suited for applications requiring strong mathematical problem-solving and reasoning abilities, especially where the DeepSeekMath approach to mathematical reasoning is beneficial.

Overview

Overview

Key Capabilities

Training Details

When to Use

Full Model Card (README)