zhaohq/RLCR-1.5B-hotpot-rac

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 15, 2026Architecture:Transformer Warm

The zhaohq/RLCR-1.5B-hotpot-rac model is a 1.5 billion parameter language model fine-tuned from Qwen/Qwen2.5-1.5B, featuring a 32768-token context length. It was trained using the GRPO method, as introduced in the DeepSeekMath paper, which specializes in enhancing mathematical reasoning capabilities. This model is primarily designed for tasks requiring advanced reasoning, particularly in areas where mathematical problem-solving is critical.

Loading preview...

RLCR-1.5B-hotpot-rac Overview

This model, developed by zhaohq, is a 1.5 billion parameter language model built upon the Qwen2.5-1.5B architecture. It distinguishes itself through its specialized training using the GRPO (Gradient-based Reasoning Policy Optimization) method, a technique highlighted in the DeepSeekMath paper for its effectiveness in improving mathematical reasoning in large language models.

Key Capabilities

  • Enhanced Reasoning: Optimized for complex reasoning tasks, particularly those involving mathematical concepts.
  • Fine-tuned Performance: Leverages the robust base of Qwen2.5-1.5B with targeted fine-tuning for specific reasoning challenges.
  • Extended Context: Supports a substantial context length of 32768 tokens, allowing for processing longer and more intricate inputs.

Good for

  • Applications requiring strong mathematical problem-solving abilities.
  • Research and development in advanced reasoning for LLMs.
  • Tasks where understanding and generating logical, step-by-step solutions are crucial.