Name: zhaohq/PureRL-1.5B-v6d4-lam01-sigmoid-maskoff-acc05 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The PureRL-1.5B-v6d4-lam01-sigmoid-maskoff-acc05 is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned variant of the Qwen/Qwen2.5-Math-1.5B base model, inheriting its 32768 token context length.

Key Capabilities & Training

This model's primary differentiator lies in its training methodology. It was fine-tuned using GRPO (Generalized Reinforcement Learning with Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training aims to significantly improve the model's performance on mathematical reasoning tasks.

Technical Details

Base Model: Qwen/Qwen2.5-Math-1.5B
Parameter Count: 1.5 Billion
Context Length: 32768 tokens
Training Framework: TRL (Transformer Reinforcement Learning)
Training Method: GRPO, focused on mathematical reasoning enhancement.

Recommended Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving and reasoning. Its GRPO-based fine-tuning suggests improved capabilities in handling complex mathematical queries and generating accurate solutions, making it a strong candidate for educational tools, scientific research assistance, or any domain where precise mathematical understanding is critical.

Overview

Model Overview

Key Capabilities & Training

Technical Details

Recommended Use Cases

Full Model Card (README)