G1-Zero-3B: Graph Reasoning LLM
PKU-ML/G1-Zero-3B is a 3.09 billion parameter causal language model, part of the G1 series, developed by PKU-ML. It is built upon the Qwen2.5-Instruct architecture and has been fine-tuned using Group Relative Policy Optimization (GRPO) for reinforcement learning, specifically targeting graph reasoning tasks. The model supports a full context length of 32,768 tokens for input and 8192 tokens for generation.
Key Capabilities & Differentiators
- Exceptional Graph Reasoning: Achieves up to 46% improvement over baselines on the Erdos benchmark, with the 3B variant outperforming Qwen2.5-72B-Instruct on these tasks.
- Strong Generalization: Demonstrates zero-shot generalization to novel graph tasks, improving performance on other benchmarks like GraphWiz and GraphArena, as well as real-world graphs (Cora, PubMed).
- Preserves General Reasoning: Crucially, G1-Zero-3B maintains strong performance on general reasoning benchmarks such as GSM8K, MATH, and MMLU-Pro, indicating its versatility beyond specialized graph tasks.
Should you use this for your use case?
- Use if: Your application involves complex graph-related reasoning, such as analyzing network structures, solving graph theory problems, or tasks requiring understanding relationships within interconnected data. Its specialized training makes it highly effective for these scenarios.
- Consider alternatives if: Your primary use case is general-purpose text generation, creative writing, or tasks where graph reasoning is not a core requirement, as other models might offer different optimizations for those specific needs.