Name: ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: ypwang61

Model Overview

The ypwang61/One-Shot-RLVR-Qwen2.5-Math-1.5B-pi1 is a 1.5 billion parameter model built upon the Qwen2.5 architecture, featuring a substantial 32768 token context window. This model is a product of research detailed in the paper "Reinforcement Learning for Reasoning in Large Language Models with One Training Example."

Key Capabilities

Reinforcement Learning for Reasoning (RLVR): Integrates advanced reinforcement learning techniques to improve reasoning abilities.
One-Shot Learning: Optimized to perform effectively with just a single training example, enhancing its adaptability and efficiency.
Mathematical Problem Solving: Specifically fine-tuned and designed to excel in tasks requiring mathematical reasoning.

Good For

Applications requiring robust mathematical reasoning with limited training data.
Research and development in one-shot learning and reinforcement learning for language models.
Scenarios where efficient, example-lean reasoning is critical.

For more technical details and the associated code, refer to the Reinforcement Learning for Reasoning in Large Language Models with One Training Example paper and the GitHub repository.