Name: mimoidochi/OpenRS-GRPO-S-2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mimoidochi

Model Overview

mimoidochi/OpenRS-GRPO-S-2 is a 1.5 billion parameter language model built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been fine-tuned on the knoveleng/open-rs dataset, which is likely geared towards reasoning tasks.

Key Capabilities

Enhanced Reasoning: The model was trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, indicating a focus on improving reasoning abilities.
Mathematical Proficiency: Given its training with GRPO, this model is particularly suited for tasks that involve mathematical reasoning and problem-solving.
Extended Context: It supports a context length of 32,768 tokens, allowing it to process and generate longer sequences of text.

Training Details

The model's fine-tuning process utilized the TRL library. The GRPO method, a key component of its training, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Ideal Use Cases

This model is a strong candidate for applications requiring:

Complex reasoning tasks.
Mathematical problem-solving and generation.
Processing long documents or conversations where extended context is beneficial.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)