Name: zhaohq/RLCR-1.5B-hotpot-rac API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

RLCR-1.5B-hotpot-rac Overview

This model, developed by zhaohq, is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B architecture. It distinguishes itself through its specialized training using the GRPO (Gradient-based Reasoning Policy Optimization) method, a technique highlighted in the DeepSeekMath paper for its effectiveness in improving mathematical reasoning in large language models.

Key Capabilities

Enhanced Reasoning: Optimized for complex reasoning tasks, particularly those involving mathematical concepts.
Fine-tuned Performance: Leverages the robust base of Qwen2.5-1.5B with targeted fine-tuning for specific reasoning challenges.
Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing longer and more intricate inputs.

Good for

Applications requiring strong mathematical problem-solving abilities.
Research and development in advanced reasoning for LLMs.
Tasks where understanding and generating logical, step-by-step solutions are crucial.

Overview

RLCR-1.5B-hotpot-rac Overview

Key Capabilities

Good for

Full Model Card (README)