CM/Qwen2.5-1.5B-Open-R1-Code-GRPO

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Feb 21, 2025Architecture:Transformer Warm

CM/Qwen2.5-1.5B-Open-R1-Code-GRPO is a 1.5 billion parameter language model developed by CM, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. It is specifically optimized for code generation and problem-solving, leveraging the GRPO training method. This model excels at verifiable coding tasks, making it suitable for applications requiring robust code output within its 32768 token context length.

Loading preview...

Model Overview

CM/Qwen2.5-1.5B-Open-R1-Code-GRPO is a 1.5 billion parameter language model, fine-tuned by CM from the base Qwen/Qwen2.5-1.5B-Instruct architecture. It is designed for code-related tasks, specifically trained on the open-r1/verifiable-coding-problems-python-10k dataset.

Key Capabilities

  • Code Generation: Specialized in generating Python code for verifiable problems.
  • GRPO Training: Utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its reasoning and problem-solving abilities in a coding context.
  • Context Length: Supports a substantial context window of 32768 tokens, allowing for processing longer code snippets or problem descriptions.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The GRPO method, a key aspect of its training, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models".

Good For

  • Automated Code Generation: Generating Python code solutions for defined problems.
  • Coding Assistance: Aiding developers by providing code suggestions or completing functions.
  • Educational Tools: Creating verifiable coding exercises or solutions.