Name: mimoidochi/OpenRS-GRPO-S API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mimoidochi

Model Overview

mimoidochi/OpenRS-GRPO-S is a 1.5 billion parameter language model, fine-tuned by mimoidochi from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. It features a substantial context length of 32768 tokens, enabling it to process and generate longer sequences of text. The model's training incorporates the GRPO (Generative Reinforcement learning from Policy Optimization) method, a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training, conducted using the TRL framework on the knoveleng/open-rs dataset, aims to enhance the model's capabilities in complex reasoning.

Key Capabilities

Mathematical Reasoning: Leverages the GRPO training method, which is designed to improve performance on mathematical and logical reasoning tasks.
Extended Context: Supports a 32768 token context window, beneficial for understanding and generating coherent responses over long inputs.
Fine-tuned for Specific Data: Trained on the knoveleng/open-rs dataset, suggesting potential strengths in areas related to the dataset's content.

Good For

Applications requiring strong mathematical or logical reasoning.
Tasks benefiting from a large context window for detailed analysis or generation.
Research and development in reinforcement learning from human feedback (RLHF) and policy optimization for language models.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)