Name: mimoidochi/OpenRS-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mimoidochi

Model Overview

mimoidochi/OpenRS-GRPO is a 1.5 billion parameter language model built upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. It has been fine-tuned on the knoveleng/open-rs dataset using the TRL library.

Key Training Methodology

A distinguishing feature of this model is its training with GRPO (Gradient-based Reinforcement Learning with Policy Optimization). This method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests an optimization for reasoning capabilities. The training process utilized TRL version 0.14.0, Transformers 4.49.0, Pytorch 2.5.1, Datasets 4.5.0, and Tokenizers 0.21.4.

Use Cases

Given its fine-tuning on the open-rs dataset, OpenRS-GRPO is particularly well-suited for:

Conversational AI: Generating coherent and contextually relevant responses in dialogue systems.
Question Answering: Providing detailed answers to user queries.
General Text Generation: Creating human-like text based on prompts, leveraging its reasoning-oriented training.

Overview

Model Overview

Key Training Methodology

Use Cases

Full Model Card (README)