ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-1.2k-dsr-sub

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Aug 27, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

The ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-1.2k-dsr-sub is a 1.5 billion parameter Qwen2.5-based language model developed by ypwang61. It is specifically fine-tuned using Reinforcement Learning for Reasoning (RLVR) with a single training example, focusing on mathematical reasoning tasks. This model is designed to enhance reasoning capabilities in LLMs, particularly for complex mathematical problems, and supports a context length of 131072 tokens.

Loading preview...

Model Overview

The ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-1.2k-dsr-sub is a 1.5 billion parameter model built upon the Qwen2.5 architecture. Developed by ypwang61, this model introduces a novel approach to improving reasoning in large language models through Reinforcement Learning for Reasoning (RLVR).

Key Capabilities

  • Enhanced Mathematical Reasoning: The model is specifically fine-tuned to excel in mathematical reasoning tasks, leveraging a unique one-shot training methodology.
  • Reinforcement Learning for Reasoning (RLVR): It incorporates RLVR with a single training example, as detailed in the associated research paper, to boost its reasoning performance.
  • High Context Length: Supports an extensive context window of 131072 tokens, allowing for processing of long and complex problem descriptions.

What Makes This Model Different?

This model stands out due to its innovative application of one-shot Reinforcement Learning for Reasoning (RLVR). Unlike traditional fine-tuning methods that require large datasets, this model demonstrates how reasoning capabilities can be significantly improved with just a single training example, making it highly efficient for specialized tasks like mathematical problem-solving. It offers a focused solution for developers needing robust mathematical reasoning from a compact 1.5B parameter model.

Good For

  • Applications requiring strong mathematical reasoning.
  • Research into efficient fine-tuning methods for LLMs.
  • Scenarios where computational resources are limited but advanced reasoning is needed.

For more technical details, refer to the associated paper and the code repository.