G1-3B: Graph Reasoning LLM
G1-3B is a 3.09 billion parameter causal language model from PKU-ML, built upon the Qwen2.5-Instruct architecture. It has been specifically trained using Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) with Group Relative Policy Optimization (GRPO) to enhance its capabilities in graph reasoning tasks. The model supports a full context length of 32,768 tokens.
Key Capabilities & Differentiators
- Exceptional Graph Reasoning: Achieves up to 46% improvement over baselines on the Erdős benchmark, with the 3B variant outperforming Qwen2.5-72B-Instruct on these tasks.
- Strong Generalization: Demonstrates zero-shot generalization to unseen graph tasks, improving performance on benchmarks like GraphWiz and GraphArena, and real-world graphs such as Cora and PubMed.
- Preserved General Reasoning: Maintains strong performance on general reasoning benchmarks including GSM8K, MATH, and MMLU-Pro, ensuring versatility without compromising core LLM abilities.
Use Cases
- Graph-related Problem Solving: Ideal for applications requiring complex reasoning over graph structures.
- Research in Graph Neural Networks: Useful for researchers exploring the intersection of LLMs and graph reasoning.
- Educational Tools: Can be integrated into tools for teaching or solving graph theory problems.