Name: zhaohq/PureRL-1.5B-v7-s2-margin-maskon-afew API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

The zhaohq/PureRL-1.5B-v7-s2-margin-maskon-afew is a 1.5 billion parameter language model, building upon the zhaohq/PureRL-1.5B-v7-stage1-A-fewshot base. This model has been specifically fine-tuned using the TRL framework, a library for Transformer Reinforcement Learning.

Key Differentiator: GRPO Training

A significant aspect of this model's development is its training with GRPO (Generalized Reinforcement Learning with Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", is designed to enhance mathematical reasoning abilities in language models. This suggests the model is particularly adept at handling complex mathematical problems and logical deductions.

Technical Specifications

Parameter Count: 1.5 Billion
Context Length: 32768 tokens
Training Frameworks: TRL (0.16.0.dev0), Transformers (4.48.3), Pytorch (2.5.1+cu124), Datasets (4.0.0), Tokenizers (0.21.1)

Use Cases

Given its specialized training with GRPO, this model is well-suited for applications requiring:

Mathematical Reasoning: Solving complex math problems, generating mathematical explanations, or assisting in scientific computations.
Logical Deduction: Tasks that benefit from structured reasoning and problem-solving.
Advanced NLP: Scenarios where a strong understanding of numerical and logical relationships is crucial.

Overview

Overview

Key Differentiator: GRPO Training

Technical Specifications

Use Cases

Full Model Card (README)