Name: mimoidochi/OpenRS-GRPO-1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mimoidochi

Overview

mimoidochi/OpenRS-GRPO-1 is a 1.5 billion parameter language model, fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Its development utilized the knoveleng/open-rs dataset and the TRL framework.

Key Capabilities

Enhanced Reasoning: This model was trained using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, which focuses on improving mathematical and general reasoning abilities.
Contextual Understanding: Supports a substantial context length of 32768 tokens, allowing for processing and generating longer, more coherent texts.
Fine-tuned Performance: Leverages a strong base model and specialized fine-tuning to deliver focused performance on reasoning-intensive tasks.

Training Details

The model's training procedure involved the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using the TRL library.

When to Use This Model

This model is particularly well-suited for applications requiring strong logical inference and mathematical reasoning, benefiting from its GRPO-based training. It can be a good choice for tasks where understanding complex relationships and generating reasoned responses are critical.

Overview

Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)