YichuanMa/LoGos-7B
YichuanMa/LoGos-7B is a specialized large language model built upon Qwen2.5-7B, designed for Go game reasoning and analysis. It integrates professional Go knowledge with advanced chain-of-thought reasoning through a novel mixed training approach combining cold start and GRPO reinforcement learning. This 7 billion parameter model excels at predicting and analyzing moves in Go games, providing detailed reasoning and strategic insights. Its primary strength lies in transferring complex reasoning capabilities to Go tasks, making it suitable for advanced Go analysis.
Loading preview...
LoGos-7B: Specialized Go Game Reasoning LLM
LoGos-7B is a specialized large language model developed by Yichuan Ma and team, focusing on Go game reasoning and analysis. Built on the Qwen2.5-7B architecture, this model uniquely integrates professional Go knowledge with advanced chain-of-thought (CoT) reasoning capabilities.
Key Capabilities
- Go Game Analysis: Designed to analyze current board states, predict optimal next moves, and provide detailed strategic reasoning.
- Advanced Reasoning: Utilizes a novel mixed training approach, combining cold start and Group Relative Policy Optimization (GRPO) reinforcement learning, to transfer reasoning acquired from long CoT data to Go tasks.
- Strategic Prediction: Capable of evaluating multiple possible next steps, deducing subsequent variations, and selecting the most appropriate move with accompanying win rate predictions.
- Interactive Output: Generates detailed, thoughtful, and engaging responses in a professional yet interactive style, suitable for simulating a Go professional's analysis.
Training and Methodology
The model's unique training methodology involves integrating Go professional capabilities with LLMs' long CoT reasoning. This process ensures that LoGos-7B can effectively leverage complex reasoning patterns for strategic Go gameplay. The project acknowledges contributions from verl for RL infrastructure, KataGo for Go evaluation tools, and Yike for professional Go datasets.
Use Cases
LoGos-7B is ideal for applications requiring deep Go game analysis, strategic planning, and educational tools for Go players. It can be used to simulate professional Go commentary, analyze historical games, or assist players in understanding complex board positions and optimal move sequences.