Name: yarin-shaked/Qwen3-Codeforces-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yarin-shaked

Model Overview

yarin-shaked/Qwen3-Codeforces-GRPO is a specialized language model with 0.8 billion parameters and a 32768-token context length. It is a fine-tuned variant of the Qwen/Qwen3-0.6B base model, specifically optimized for mathematical reasoning and problem-solving tasks.

Key Capabilities

Enhanced Mathematical Reasoning: The model's core strength lies in its ability to handle complex mathematical problems, derived from its training on the open-r1/codeforces dataset.
GRPO Training Method: It utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), which is designed to improve mathematical reasoning capabilities.
Codeforces Dataset Specialization: Training on the Codeforces dataset makes it particularly adept at understanding and generating solutions for competitive programming challenges.

Good For

Competitive Programming: Ideal for tasks related to competitive programming, including problem analysis and solution generation.
Mathematical Problem Solving: Excels in scenarios requiring advanced mathematical and logical inference.
Research in Reasoning Models: Useful for researchers exploring the application of GRPO and similar methods to enhance LLM reasoning.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)