Name: zhaohq/PureRL-1.5B-v13C-lam010 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v13C-lam010 is a 1.5 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-1.5B base model. It leverages the TRL (Transformer Reinforcement Learning) library for its training process.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was trained using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a strong focus on improving the model's mathematical reasoning and problem-solving abilities.

Capabilities & Use Cases

Enhanced Mathematical Reasoning: Due to its GRPO training on a math-focused base model, PureRL-1.5B-v13C-lam010 is particularly well-suited for tasks that require complex mathematical understanding and logical deduction.
Instruction Following: As a fine-tuned model, it is designed to follow instructions effectively, making it suitable for various generative AI applications.
Long Context Processing: With a context length of 32768 tokens, it can handle and process extensive inputs, which is beneficial for multi-step reasoning problems or detailed queries.

When to Use This Model

Consider using PureRL-1.5B-v13C-lam010 if your application involves:

Solving mathematical problems or equations.
Generating logical explanations or proofs.
Tasks requiring robust reasoning capabilities, especially in quantitative domains.
Applications where a smaller, efficient model with specialized mathematical prowess is preferred over larger, general-purpose models.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities & Use Cases

When to Use This Model

Full Model Card (README)