Name: PKU-ML/G1-CoT-SFT-3B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PKU-ML

G1-CoT-SFT-3B: Graph Reasoning with Reinforcement Learning

PKU-ML/G1-CoT-SFT-3B is a 3.09 billion parameter causal language model, part of the G1 series, developed by PKU-ML. It is built upon the Qwen2.5-Instruct architecture and has undergone supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO) for reinforcement learning.

Key Capabilities and Differentiators

Exceptional Graph Reasoning: G1 models achieve up to a 46% improvement over baselines on the Erdős benchmark for graph reasoning tasks. The 3B variant notably surpasses Qwen2.5-72B-Instruct in this domain.
Strong Generalization: The model exhibits zero-shot generalization to new graph tasks, showing improved performance on other graph reasoning benchmarks (GraphWiz, GraphArena) and real-world graphs (Cora, PubMed).
Preserved General Reasoning: Despite its specialization, G1-CoT-SFT-3B maintains strong performance on general reasoning benchmarks such as GSM8K, MATH, and MMLU-Pro, ensuring versatility.
Architecture and Context: Based on Qwen2.5-Instruct, it supports a full context length of 32,768 tokens.

Good For

Applications requiring advanced graph reasoning and analysis.
Tasks involving complex graph structures where generalization to unseen data is crucial.
Use cases where a smaller model (3.09B parameters) can deliver specialized performance comparable to much larger general-purpose models in graph-related domains.

For more in-depth details, refer to the official paper and GitHub repository.

Overview

G1-CoT-SFT-3B: Graph Reasoning with Reinforcement Learning

Key Capabilities and Differentiators

Good For

Full Model Card (README)