Name: zhaohq/PureRL-1.5B-v6d5-lam01-sigmoid-maskon-acc10 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

PureRL-1.5B-v6d5-lam01-sigmoid-maskon-acc10 is a 1.5 billion parameter language model, fine-tuned by zhaohq from the Qwen/Qwen2.5-Math-1.5B base model. It was trained using the TRL framework.

Key Capabilities

Mathematical Reasoning: This model is specifically enhanced for mathematical tasks, building on its Qwen2.5-Math foundation.
GRPO Training: It utilizes the GRPO (Generalized Reinforcement Learning for Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach aims to improve its performance in complex mathematical problem-solving.

Training Details

Base Model: Fine-tuned from Qwen/Qwen2.5-Math-1.5B.
Frameworks: Trained with TRL (version 0.16.0.dev0), Transformers (4.48.3), Pytorch (2.5.1), Datasets (4.0.0), and Tokenizers (0.21.1).

Good For

Applications requiring strong mathematical reasoning abilities.
Research and development in reinforcement learning for language models, particularly those exploring GRPO.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)