yarin-shaked/Qwen3-Codeforces-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Oct 21, 2025Architecture:Transformer Warm

The yarin-shaked/Qwen3-Codeforces-GRPO is a 0.8 billion parameter language model, fine-tuned from Qwen/Qwen3-0.6B. It specializes in mathematical reasoning and problem-solving, having been trained on the open-r1/codeforces dataset using the GRPO method. This model is particularly suited for tasks requiring advanced mathematical and logical inference, leveraging its specialized training for competitive programming contexts.

Loading preview...

Model Overview

yarin-shaked/Qwen3-Codeforces-GRPO is a specialized language model with 0.8 billion parameters and a 32768-token context length. It is a fine-tuned variant of the Qwen/Qwen3-0.6B base model, specifically optimized for mathematical reasoning and problem-solving tasks.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model's core strength lies in its ability to handle complex mathematical problems, derived from its training on the open-r1/codeforces dataset.
  • GRPO Training Method: It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), which is designed to improve mathematical reasoning capabilities.
  • Codeforces Dataset Specialization: Training on the Codeforces dataset makes it particularly adept at understanding and generating solutions for competitive programming challenges.

Good For

  • Competitive Programming: Ideal for tasks related to competitive programming, including problem analysis and solution generation.
  • Mathematical Problem Solving: Excels in scenarios requiring advanced mathematical and logical inference.
  • Research in Reasoning Models: Useful for researchers exploring the application of GRPO and similar methods to enhance LLM reasoning.