Name: chansung/Qwen2.5-1.5B-Open-R1-Code-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: chansung

Model Overview

This model, chansung/Qwen2.5-1.5B-Open-R1-Code-GRPO, is a 1.5 billion parameter language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been specifically fine-tuned by chansung using the TRL library on the chansung/verifiable-coding-problems dataset.

Key Capabilities

Enhanced Code Generation: Specialized training on a verifiable coding problems dataset significantly improves its ability to generate and understand code.
Mathematical Reasoning: The model incorporates the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, which is designed to push the limits of mathematical reasoning in language models.
Instruction Following: Retains the instruction-following capabilities of its base Qwen2.5-1.5B-Instruct model.

Good For

Programming Assistance: Ideal for tasks involving code generation, completion, and problem-solving in programming contexts.
Mathematical Problem Solving: Suitable for applications requiring logical deduction and mathematical reasoning.
Research and Development: Provides a compact yet powerful base for further experimentation in code and math-centric AI applications, leveraging the GRPO training approach.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)