Name: Lucien520/Qwen2.5-1.5B-Open-R1-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Lucien520

Model Overview

Lucien520/Qwen2.5-1.5B-Open-R1-GRPO is a 1.5 billion parameter language model based on the Qwen2.5 architecture. This model has been fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.

Key Characteristics

Parameter Count: 1.5 billion parameters.
Training Method: Utilizes GRPO, a technique aimed at improving mathematical reasoning.
Frameworks: Trained with TRL (Transformer Reinforcement Learning) version 0.18.0, Transformers 4.52.3, Pytorch 2.6.0, Datasets 4.4.1, and Tokenizers 0.21.4.
Context Length: Supports a substantial context length of 131072 tokens.

Intended Use Cases

This model is particularly well-suited for applications that demand strong mathematical and logical reasoning. Its fine-tuning with GRPO suggests an optimization for tasks such as:

Solving mathematical problems.
Generating logical explanations for numerical concepts.
Assisting in scientific or engineering calculations where reasoning is paramount.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)