Name: zhaohq/RLCR-1.5B-hotpot-rac-lr5e6-accW1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

zhaohq/RLCR-1.5B-hotpot-rac-lr5e6-accW1 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-1.5B base model. This model was developed by zhaohq and trained using the TRL library, specifically incorporating the GRPO (Gradient-based Reward Policy Optimization) method. GRPO is a technique highlighted in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper, suggesting an optimization for complex reasoning tasks.

Key Capabilities

Enhanced Reasoning: Fine-tuned with the GRPO method, which is associated with improving mathematical and general reasoning in language models.
Qwen2.5-1.5B Base: Benefits from the robust architecture and pre-training of the Qwen2.5-1.5B model.
TRL Framework: Developed using the TRL (Transformer Reinforcement Learning) library, indicating a reinforcement learning approach to fine-tuning.

Good For

Applications requiring improved reasoning abilities, potentially in areas like complex question answering or logical inference.
Researchers and developers interested in exploring models fine-tuned with advanced reinforcement learning techniques like GRPO.
Use cases where a smaller, efficient model (1.5B parameters) with specialized reasoning capabilities is preferred over larger, more general-purpose LLMs.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)