LoGos-7B: Specialized Go Game Reasoning LLM

LoGos-7B is a specialized large language model developed by Yichuan Ma and team, focusing on Go game reasoning and analysis. Built on the Qwen2.5-7B architecture, this model uniquely integrates professional Go knowledge with advanced chain-of-thought (CoT) reasoning capabilities.

Key Capabilities

Go Game Analysis: Designed to analyze current board states, predict optimal next moves, and provide detailed strategic reasoning.
Advanced Reasoning: Utilizes a novel mixed training approach, combining cold start and Group Relative Policy Optimization (GRPO) reinforcement learning, to transfer reasoning acquired from long CoT data to Go tasks.
Strategic Prediction: Capable of evaluating multiple possible next steps, deducing subsequent variations, and selecting the most appropriate move with accompanying win rate predictions.
Interactive Output: Generates detailed, thoughtful, and engaging responses in a professional yet interactive style, suitable for simulating a Go professional's analysis.

Training and Methodology

The model's unique training methodology involves integrating Go professional capabilities with LLMs' long CoT reasoning. This process ensures that LoGos-7B can effectively leverage complex reasoning patterns for strategic Go gameplay. The project acknowledges contributions from verl for RL infrastructure, KataGo for Go evaluation tools, and Yike for professional Go datasets.

Use Cases

LoGos-7B is ideal for applications requiring deep Go game analysis, strategic planning, and educational tools for Go players. It can be used to simulate professional Go commentary, analyze historical games, or assist players in understanding complex board positions and optimal move sequences.

Overview

LoGos-7B: Specialized Go Game Reasoning LLM

Key Capabilities

Training and Methodology

Use Cases

Full Model Card (README)