chansung/Qwen2.5-1.5B-Open-R1-Code-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Mar 25, 2025Architecture:Transformer Warm
chansung/Qwen2.5-1.5B-Open-R1-Code-GRPO is a 1.5 billion parameter language model, fine-tuned by chansung from Qwen/Qwen2.5-1.5B-Instruct. It specializes in code generation and mathematical reasoning, having been trained on the verifiable-coding-problems dataset using the GRPO method. This model is optimized for tasks requiring robust logical and mathematical problem-solving capabilities within a 32768 token context length.
Loading preview...
Model Overview
This model, chansung/Qwen2.5-1.5B-Open-R1-Code-GRPO, is a 1.5 billion parameter language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been specifically fine-tuned by chansung using the TRL library on the chansung/verifiable-coding-problems dataset.
Key Capabilities
- Enhanced Code Generation: Specialized training on a verifiable coding problems dataset significantly improves its ability to generate and understand code.
- Mathematical Reasoning: The model incorporates the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, which is designed to push the limits of mathematical reasoning in language models.
- Instruction Following: Retains the instruction-following capabilities of its base Qwen2.5-1.5B-Instruct model.
Good For
- Programming Assistance: Ideal for tasks involving code generation, completion, and problem-solving in programming contexts.
- Mathematical Problem Solving: Suitable for applications requiring logical deduction and mathematical reasoning.
- Research and Development: Provides a compact yet powerful base for further experimentation in code and math-centric AI applications, leveraging the GRPO training approach.