Name: NotoriousH2/gemma-3-1b-it-Math-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: NotoriousH2

NotoriousH2/gemma-3-1b-it-Math-GRPO Overview

This model is a 1 billion parameter Gemma-based instruction-tuned language model developed by NotoriousH2, specifically engineered for Korean mathematical reasoning. It leverages a sophisticated three-stage training pipeline: Supervised Fine-Tuning (SFT), Rejection Sampling SFT (RS-SFT), and Generative Reinforcement Learning with Policy Optimization (GRPO).

Key Capabilities & Performance

Specialized Math Reasoning: Optimized for solving mathematical problems in Korean.
Benchmark Performance: Achieves approximately 46.2% on the Korean GSM8K evaluation (264 problems) and ~16.5% on the Korean MATH benchmark (577 problems).
Advanced Training: Utilizes a GRPO stage, though the README notes that for this 1B model, GRPO did not provide significant improvement over the RS-SFT baseline, suggesting the model's capacity was already near optimal with SFT+RS-SFT.
Context Length: Features a substantial 32768 token context window, beneficial for handling longer mathematical problems and complex instructions.

Training Methodology Highlights

SFT \u2192 RS-SFT \u2192 GRPO Pipeline: A multi-stage approach to enhance instruction following and reasoning.
Data Strategy: GRPO stage uses only prompts from 6,871 unique Korean GSM8K training problems, with the model generating its own solutions for reward calculation.
DPO Analysis: The developers conducted extensive analysis on DPO (Direct Preference Optimization) failures, concluding that the 1B model lacked the capacity to discern subtle differences between correct and incorrect solutions for DPO effectiveness.

Good For

Korean Mathematical Problem Solving: Ideal for applications requiring a compact model to perform arithmetic and reasoning tasks in Korean.
Research into RLHF for Smaller Models: Provides insights into the limitations and effectiveness of advanced RL techniques like GRPO on 1B parameter models.

Overview

NotoriousH2/gemma-3-1b-it-Math-GRPO Overview

Key Capabilities & Performance

Training Methodology Highlights

Good For

Full Model Card (README)