Name: zhaohq/PureRL-1.5B-v13A-lam002 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v13A-lam002 is a 1.5 billion parameter language model, building upon the Qwen/Qwen2.5-Math-1.5B architecture. It has been specifically fine-tuned using the TRL (Transformer Reinforcement Learning) framework, incorporating the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. This training approach is derived from techniques highlighted in the research behind DeepSeekMath, which aims to enhance mathematical reasoning in large language models.

Key Characteristics

Base Model: Fine-tuned from Qwen/Qwen2.5-Math-1.5B.
Training Method: Utilizes GRPO, a method for improving mathematical reasoning, as detailed in the DeepSeekMath paper.
Framework: Trained with Hugging Face's TRL library.
Context Length: Supports a substantial context window of 32768 tokens.

Potential Use Cases

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning and computation.
Logical Deduction: Suitable for tasks that benefit from enhanced logical processing capabilities.
Research and Development: Can serve as a base for further experimentation with reinforcement learning techniques in language models, particularly for specialized domains.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)