PKU-ML/G1-3B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:May 31, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

PKU-ML/G1-3B is a 3.09 billion parameter causal language model, based on the Qwen2.5-Instruct architecture, developed by PKU-ML. It is specifically fine-tuned using Group Relative Policy Optimization (GRPO) for reinforcement learning to excel at graph reasoning tasks. This model demonstrates significant improvements on graph reasoning benchmarks like Erdős and generalizes well to unseen graph tasks, while preserving general reasoning abilities.

Loading preview...

G1-3B: Graph Reasoning LLM

G1-3B is a 3.09 billion parameter causal language model from PKU-ML, built upon the Qwen2.5-Instruct architecture. It has been specifically trained using Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) with Group Relative Policy Optimization (GRPO) to enhance its capabilities in graph reasoning tasks. The model supports a full context length of 32,768 tokens.

Key Capabilities & Differentiators

  • Exceptional Graph Reasoning: Achieves up to 46% improvement over baselines on the Erdős benchmark, with the 3B variant outperforming Qwen2.5-72B-Instruct on these tasks.
  • Strong Generalization: Demonstrates zero-shot generalization to unseen graph tasks, improving performance on benchmarks like GraphWiz and GraphArena, and real-world graphs such as Cora and PubMed.
  • Preserved General Reasoning: Maintains strong performance on general reasoning benchmarks including GSM8K, MATH, and MMLU-Pro, ensuring versatility without compromising core LLM abilities.

Use Cases

  • Graph-related Problem Solving: Ideal for applications requiring complex reasoning over graph structures.
  • Research in Graph Neural Networks: Useful for researchers exploring the intersection of LLMs and graph reasoning.
  • Educational Tools: Can be integrated into tools for teaching or solving graph theory problems.