johnjeanc/OpenRS-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:May 9, 2025Architecture:Transformer Cold

johnjeanc/OpenRS-GRPO is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, with a 32768 token context length. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is optimized for tasks requiring robust reasoning, leveraging its foundation in mathematical reasoning techniques.

Loading preview...

Model Overview

johnjeanc/OpenRS-GRPO is a 1.5 billion parameter language model, building upon the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. It features a substantial context length of 32768 tokens, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

This model's primary distinction lies in its training methodology. It was fine-tuned using GRPO (Gradient-based Reasoning Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training aims to enhance the model's capabilities in complex reasoning tasks, particularly those with a mathematical or logical underpinning.

Use Cases

Given its GRPO-enhanced training, OpenRS-GRPO is particularly well-suited for:

  • Reasoning-intensive applications: Tasks that benefit from improved logical deduction and problem-solving.
  • Mathematical problem-solving: Although not explicitly stated as a math-specific model, its GRPO foundation suggests potential for mathematical reasoning.
  • General text generation: As a fine-tuned language model, it can handle a variety of text generation tasks, with an emphasis on coherent and logical outputs due to its training.

Technical Details

The model was trained using the TRL (Transformer Reinforcement Learning) library. The base model, DeepSeek-R1-Distill-Qwen-1.5B, provides a strong foundation, which is then specialized through the GRPO fine-tuning process.