Name: zhaohq/PureRL-1.5B-v6g-A-lam01-sigmoid-maskoff API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

This model, PureRL-1.5B-v6g-A-lam01-sigmoid-maskoff, is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned version of the Qwen/Qwen2.5-Math-1.5B base model, leveraging a substantial 32768 token context length.

Key Capabilities

Enhanced Mathematical Reasoning: The model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, specifically to improve its mathematical problem-solving abilities.
Fine-tuned with TRL: Training was conducted using the TRL library, a framework for Transformer Reinforcement Learning.

Good For

Applications requiring strong mathematical reasoning and logical deduction.
Tasks where a smaller, specialized model with a long context window is beneficial for mathematical or complex problem-solving.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)