PKU-ML/G1-CoT-SFT-3B
PKU-ML/G1-CoT-SFT-3B is a 3.09 billion parameter causal language model developed by PKU-ML, based on Qwen2.5-Instruct. This model is specifically fine-tuned using Group Relative Policy Optimization (GRPO) for reinforcement learning, excelling in graph reasoning tasks. It demonstrates significant improvements on graph reasoning benchmarks like Erdős and strong generalization to unseen graph tasks, while preserving general reasoning abilities. The model supports a full context length of 32,768 tokens.
Loading preview...
G1-CoT-SFT-3B: Graph Reasoning with Reinforcement Learning
PKU-ML/G1-CoT-SFT-3B is a 3.09 billion parameter causal language model, part of the G1 series, developed by PKU-ML. It is built upon the Qwen2.5-Instruct architecture and has undergone supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO) for reinforcement learning.
Key Capabilities and Differentiators
- Exceptional Graph Reasoning: G1 models achieve up to a 46% improvement over baselines on the Erdős benchmark for graph reasoning tasks. The 3B variant notably surpasses Qwen2.5-72B-Instruct in this domain.
- Strong Generalization: The model exhibits zero-shot generalization to new graph tasks, showing improved performance on other graph reasoning benchmarks (GraphWiz, GraphArena) and real-world graphs (Cora, PubMed).
- Preserved General Reasoning: Despite its specialization, G1-CoT-SFT-3B maintains strong performance on general reasoning benchmarks such as GSM8K, MATH, and MMLU-Pro, ensuring versatility.
- Architecture and Context: Based on Qwen2.5-Instruct, it supports a full context length of 32,768 tokens.
Good For
- Applications requiring advanced graph reasoning and analysis.
- Tasks involving complex graph structures where generalization to unseen data is crucial.
- Use cases where a smaller model (3.09B parameters) can deliver specialized performance comparable to much larger general-purpose models in graph-related domains.
For more in-depth details, refer to the official paper and GitHub repository.