Name: hector-gr/RLCR-v4-ks-batch-frontier-combo-hotpot API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: hector-gr

Model Overview

hector-gr/RLCR-v4-ks-batch-frontier-combo-hotpot is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B base model. It leverages the TRL (Transformer Reinforcement Learning) framework for its training process.

Key Capabilities

Enhanced Reasoning: This model has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the DeepSeekMath paper. This training approach is designed to push the limits of mathematical and complex reasoning in language models.
Extended Context: Features a substantial context length of 32768 tokens, allowing for processing and understanding longer inputs and maintaining coherence over extended dialogues or documents.

Training Details

The model's fine-tuning utilized TRL version 0.16.0.dev0, Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1. The GRPO method, central to its training, aims to improve performance on tasks requiring logical and mathematical inference.

Use Cases

This model is particularly well-suited for applications that demand strong reasoning abilities, such as mathematical problem-solving, logical deduction, and complex question-answering where deep contextual understanding is critical.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)