Name: youngzhong/SOD-GRPO_teacher-4B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: youngzhong

Model Overview

SOD-GRPO_teacher-4B is a 4 billion parameter agentic reasoning model developed by youngzhong, built upon the Qwen3-4B base model. It is trained using Group Relative Policy Optimization (GRPO), a method designed to enhance agentic reasoning capabilities. This model's primary role is to act as a teacher within the SOD (Step-wise On-policy Distillation) framework, facilitating the distillation of knowledge to smaller student models like SOD-0.6B and SOD-1.7B.

Key Capabilities & Purpose

Agentic Reasoning: Optimized for complex reasoning tasks, particularly those involving tool integration.
Teacher Model for Distillation: Serves as the high-performing source for distilling smaller, more efficient student models using the SOD method.
Enhanced Performance: Achieves strong results on challenging benchmarks, including AIME 2024 (67.60), AIME 2025 (60.42), GPQA-Diamond (55.19), and LiveCodeBench-v6 (63.13), with an average score of 61.59.

When to Use This Model

This model is ideal for researchers and developers focused on:

Developing Smaller Agentic Models: If your goal is to create compact yet capable agentic models, SOD-GRPO_teacher-4B provides a robust teacher for distillation.
Research in Agentic Reasoning & Distillation: It's a valuable resource for exploring advanced techniques like GRPO and SOD for improving LLM agents.
Benchmarking Agentic Performance: Its reported performance on demanding math, science, and code tasks makes it a strong baseline for comparison.

Overview

Model Overview

Key Capabilities & Purpose

When to Use This Model

Full Model Card (README)