Name: hector-gr/RLCR-v4-ks-uniqueness-buf5k-hotpot API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hector-gr

Model Overview

hector-gr/RLCR-v4-ks-uniqueness-buf5k-hotpot is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B base model. This model was developed by hector-gr and trained using the Transformer Reinforcement Learning (TRL) framework, specifically incorporating the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with the GRPO method, as introduced in the DeepSeekMath paper, suggests a focus on improving mathematical reasoning abilities.
Qwen2.5-7B Foundation: Benefits from the strong base architecture of Qwen2.5-7B, providing a solid foundation for general language understanding and generation.
Extended Context Window: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more complex texts.

Training Details

The model's training procedure utilized TRL version 0.16.0.dev0, Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1. The application of GRPO indicates an emphasis on optimizing performance for specific, challenging tasks, likely related to complex problem-solving.

Good For

Applications requiring advanced mathematical reasoning.
Tasks benefiting from a large context window.
Research and development in reinforcement learning from human feedback (RLHF) and similar training methodologies.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)