Name: hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-cold-5x-math API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hector-gr

Model Overview

This model, hector-gr/RLCR-v4-ks-uniqueness-cov0-entropy100-noece-noaurc-scaletrue-cold-5x-math, is a 7.6 billion parameter language model built upon the Qwen/Qwen2.5-7B architecture. It has been fine-tuned using the TRL framework, specifically incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

Mathematical Reasoning: The model's training procedure, inspired by the DeepSeekMath paper, suggests a strong focus on enhancing mathematical problem-solving abilities.
Reinforcement Learning Fine-tuning: Utilizes advanced reinforcement learning techniques (GRPO) for improved performance in specific domains.
Extended Context Window: Supports a substantial context length of 32768 tokens, allowing for processing and understanding longer and more complex inputs.

Training Details

The model's training leveraged the TRL framework (version 0.16.0.dev0) and was conducted using PyTorch 2.5.1. The GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), was central to its fine-tuning process.

Good For

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning and computation.
Research in RLHF: Provides a practical example of GRPO application in fine-tuning large language models.
Complex Query Handling: Its large context window makes it suitable for tasks involving extensive textual information.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)