Name: zhaohq/PureRL-1.5B-v7-s2-l2-maskon-afew API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-l2-maskon-afew is a 1.5 billion parameter language model, building upon the zhaohq/PureRL-1.5B-v7-stage1-A-fewshot base. It was developed using the TRL (Transformer Reinforcement Learning) framework, indicating a focus on reinforcement learning from human feedback or similar optimization techniques.

Key Training Details

A notable aspect of this model's training is the application of GRPO (Generalized Reinforcement Learning with Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," suggests that this model may possess enhanced capabilities in areas requiring logical and mathematical reasoning, despite its general-purpose fine-tuning.

Usage and Capabilities

With a substantial context length of 32768 tokens, the model is well-suited for generating extended and contextually rich text. Its fine-tuning process aims to improve its ability to respond to diverse prompts, as demonstrated by the quick start example for open-ended questions. The model is designed for text generation tasks, offering a balance between parameter size and performance for various applications.

Overview

Model Overview

Key Training Details

Usage and Capabilities

Full Model Card (README)