QuadConnect2.5-1.5B-v0.1.0b: Connect Four AI
Lyte/QuadConnect2.5-1.5B-v0.1.0b is a specialized 1.5 billion parameter Small Language Model (SLM) developed by Lyte. It is built upon the Qwen 2.5 base model and is uniquely trained to play Connect Four using Group Relative Policy Optimization (GRPO). The model's training focuses on understanding game states and making strategic moves to win or block opponents.
Key Capabilities
- Connect Four Strategy: Specifically trained to analyze Connect Four board states and determine optimal moves.
- GRPO Training: Leverages Group Relative Policy Optimization for learning complex game strategies.
- XML Response Format: Designed to output moves and reasoning in a structured XML format, facilitating integration into game environments.
- Early Stage Development: Represents an early experimental version (v0.1.0b) with evolving reward functions, showing progressive accuracy improvements.
Performance Highlights
Evaluated on the Lyte/ConnectFour-T10 dataset, the model achieved a 15.92% accuracy in predicting correct moves at a temperature of 0.8. This indicates its current ability to identify and execute strategic plays within the game. The model's performance metrics track improvements across various versions and temperature settings, with a notable increase in correct predictions.
Ideal Use Cases
- Connect Four AI Development: Excellent for researchers and developers exploring specialized game-playing AI using LLMs.
- Reinforcement Learning Studies: Provides a practical example of GRPO application in a constrained game environment.
- Educational Tool: Can be used to demonstrate how LLMs can be fine-tuned for specific, strategic tasks beyond general language generation.