od2961/Qwen2.5-1.5B-OpenR1-GRAIL

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Aug 20, 2025Architecture:Transformer Warm

od2961/Qwen2.5-1.5B-OpenR1-GRAIL is a 1.5 billion parameter language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It was trained using the TRL framework on the od2961/grail-wage dataset, incorporating the GRPO method for enhanced mathematical reasoning. This model is specialized for tasks requiring advanced mathematical problem-solving capabilities.

Loading preview...

Model Overview

od2961/Qwen2.5-1.5B-OpenR1-GRAIL is a 1.5 billion parameter language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been specifically fine-tuned using the TRL framework, leveraging the od2961/grail-wage dataset. A key aspect of its training involved the application of the GRPO (Gradient-based Reasoning Policy Optimization) method, which was introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training methodology aims to enhance the model's capabilities in mathematical reasoning.

Key Capabilities

  • Mathematical Reasoning: Optimized for tasks requiring advanced mathematical problem-solving, building upon the GRPO training method.
  • Instruction Following: Inherits instruction-following capabilities from its base model, Qwen2.5-1.5B-Instruct.
  • Efficient Inference: As a 1.5 billion parameter model, it offers a balance between performance and computational efficiency.

Use Cases

This model is particularly well-suited for applications where robust mathematical reasoning is a primary requirement. Developers can integrate it into systems needing to process and respond to complex mathematical queries or problems, benefiting from its specialized fine-tuning.