Name: zhaohq/PureRL-1.5B-v7-s2-async-l2-maskon-afew API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/PureRL-1.5B-v7-s2-async-l2-maskon-afew is a 1.5 billion parameter language model, building upon the zhaohq/PureRL-1.5B-v7-stage1-A-fewshot base. It leverages the TRL (Transformer Reinforcement Learning) framework for its fine-tuning process.

Key Capabilities

Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method, a technique highlighted in the DeepSeekMath paper. This training approach aims to significantly enhance its ability to handle and solve complex mathematical reasoning problems.
Extended Context: It supports a substantial context length of 32768 tokens, allowing for processing longer inputs and maintaining coherence over extended dialogues or problem descriptions.
Instruction Following: As a fine-tuned model, it is optimized for following instructions, making it suitable for various prompt-based applications.

Training Details

The model's training procedure utilized GRPO, a method designed to push the boundaries of mathematical reasoning in open language models. The training was conducted using TRL version 0.16.0.dev0, Transformers 4.48.3, Pytorch 2.5.1, Datasets 4.0.0, and Tokenizers 0.21.1. Further details on the training run can be visualized via Weights & Biases.

Good For

Applications requiring strong mathematical reasoning.
Tasks benefiting from a model trained with advanced reinforcement learning techniques.
Scenarios where a balance between model size (1.5B parameters) and specialized reasoning capabilities is desired.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)