Blancy/Qwen3-1.7B-Open-R1-GRPO

Warm
Public
2B
BF16
40960
Hugging Face
Overview

Model Overview

Blancy/Qwen3-1.7B-Open-R1-GRPO is a 2 billion parameter language model derived from the Qwen/Qwen3-1.7B base model. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. The training utilized the Blancy/1ktestfrom10kwithdifficultyclasses dataset, focusing on improving performance in complex analytical scenarios.

Key Capabilities

  • Enhanced Mathematical Reasoning: Leverages the GRPO training method, known for pushing the limits of mathematical problem-solving in open language models.
  • Large Context Window: Features a significant context length of 40960 tokens, allowing for processing and understanding extensive inputs and generating comprehensive responses.
  • Fine-tuned Performance: Optimized for specific tasks through fine-tuning on a curated dataset, aiming for improved accuracy and relevance in its specialized domain.

When to Use This Model

This model is particularly well-suited for applications requiring strong analytical and mathematical reasoning. Its large context window also makes it effective for tasks that involve processing and generating long, detailed texts where understanding intricate relationships and dependencies is crucial. Developers can integrate this model using the Hugging Face transformers library for text generation tasks.