Name: Phonsiri/gemma-2-2b-Distillation-gemma-2-27b-it API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Phonsiri

Model Overview

Phonsiri/gemma-2-2b-Distillation-gemma-2-27b-it, also known as Gemma-2-2B Reasoning Edition (GRPO), is a specialized 2.6 billion parameter model built upon Google's gemma-2-2b-it. Developed by Phonsiri and CYP777, this model is uniquely engineered to perform structured mathematical reasoning and logic tasks by explicitly showing its work through a chain-of-thought process.

Key Capabilities

Step-by-step Reasoning: Unlike typical instruction-tuned models, this model is trained to output detailed reasoning steps within <reasoning> tags before providing a final answer in <answer> tags or \boxed{} format.
Enhanced Mathematical & Logic Problem Solving: Its training methodology, including Reinforcement Learning (GRPO) and knowledge distillation from the larger google/gemma-2-27b-it, significantly boosts its performance on analytical problems.
Structured Output: Adheres to a specific XML-like output format for reasoning and answers, making its outputs parseable and verifiable.

Training Methodology

The model underwent a two-stage training process:

Supervised Fine-Tuning (SFT): Initial fine-tuning on open-r1/OpenR1-Math-220k to teach the model the structure of reasoning syntax.
GRPO (Generative Reward Policy Optimization): Subsequent RL training with a custom reward system that incentivizes both correct mathematical answers and strict adherence to the <reasoning> XML formatting.

Ideal Use Cases

This model is particularly well-suited for applications requiring:

Transparent and verifiable solutions to mathematical problems.
Educational tools that demonstrate problem-solving steps.
Automated systems needing logical deduction and analytical processing.

Overview

Model Overview

Key Capabilities

Training Methodology

Ideal Use Cases

Full Model Card (README)