Name: zhaohq/PureRL-1.5B-v7-s2-margin-maskon API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

zhaohq/PureRL-1.5B-v7-s2-margin-maskon is a 1.5 billion parameter language model developed by zhaohq. It is a fine-tuned model, leveraging the TRL (Transformer Reinforcement Learning) library for its training process. The model's development specifically incorporated GRPO (Generalized Reinforcement Learning with Policy Optimization), a method detailed in the "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" paper.

Key Characteristics

Parameter Count: 1.5 billion parameters, offering a balance between performance and computational efficiency.
Context Length: Supports a substantial context window of 32768 tokens, enabling it to process and generate longer, more complex sequences of text while maintaining coherence.
Training Method: Utilizes GRPO, indicating a focus on enhancing reasoning capabilities, potentially in areas like mathematics or logical problem-solving, as suggested by its origin in the DeepSeekMath research.
Frameworks: Trained with TRL (version 0.16.0.dev0), Transformers (4.48.3), Pytorch (2.5.1), Datasets (4.0.0), and Tokenizers (0.21.1).

Potential Use Cases

Complex Question Answering: Its training with GRPO suggests an aptitude for handling questions requiring deeper reasoning.
Content Generation: Capable of generating detailed and contextually rich responses, as demonstrated by the example prompt.
Research and Development: Serves as a base for further experimentation with reinforcement learning techniques in language models, particularly for tasks benefiting from improved reasoning.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)