Lyte/QuadConnect2.5-0.5B-v0.0.9b
Lyte/QuadConnect2.5-0.5B-v0.0.9b is a 0.5 billion parameter Small Language Model (SLM) developed by Lyte, built upon the Qwen 2.5 base architecture. This specialized model is trained using Group Relative Policy Optimization (GRPO) to master the game of Connect Four. It excels at strategic gameplay within Connect Four, identifying winning moves, blocking opponents, and controlling the board.
Loading preview...
Overview
Lyte/QuadConnect2.5-0.5B-v0.0.9b is a specialized 0.5 billion parameter Small Language Model (SLM) designed to play Connect Four. Developed by Lyte, it is built on the Qwen 2.5 base model and trained using Group Relative Policy Optimization (GRPO) on the Lyte/ConnectFour-T10 dataset. This model is an early experimental version (v0.0.9b) with evolving reward functions.
Key Capabilities
- Connect Four Strategy: The model is trained to identify winning moves, block opponent's potential wins, and control the center of the Connect Four board.
- XML Response Format: It generates moves and reasoning in a structured XML format, detailing its thought process and chosen column.
- Performance: Evaluation results show a peak accuracy of 14.03% in predicting correct moves on the validation split at a temperature of 0.8.
Training Details
The model's training data was derived from the Leon-LLM/Connect-Four-Datasets-Collection, filtered to include only games with 10 or fewer turns. Training was conducted using TRL's GRPO framework. The model's performance was evaluated across various temperature settings (0.6, 0.8, 1.0).
Use Cases
- Connect Four AI: Ideal for integrating an AI player into Connect Four applications or simulations.
- Reinforcement Learning Research: Useful as a case study for applying GRPO to game-playing agents with small language models.