Name: zhaohq/PureRL-1.5B-v6d2-lam01-identity-maskon-acc05 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v6d2-lam01-identity-maskon-acc05 is a 1.5 billion parameter language model, building upon the foundation of the Qwen/Qwen2.5-Math-1.5B architecture. It has been specifically fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Differentiator

This model's primary distinction lies in its training methodology. It was developed using GRPO (Generalized Reinforcement Learning with Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on improving the model's ability to handle complex mathematical reasoning tasks.

Technical Specifications

Base Model: Qwen/Qwen2.5-Math-1.5B
Parameter Count: 1.5 billion
Context Length: 32768 tokens
Training Framework: TRL (version 0.16.0.dev0)
Training Method: GRPO

Potential Use Cases

Given its specialized training with GRPO and its origin from a math-focused base model, this model is likely well-suited for:

Mathematical problem-solving
Reasoning tasks requiring logical deduction
Applications where enhanced numerical understanding is critical

Overview

Model Overview

Key Differentiator

Technical Specifications

Potential Use Cases

Full Model Card (README)