jspaulsen/halluci-mate-v1c
The jspaulsen/halluci-mate-v1c is a 0.8 billion parameter DPO fine-tuned causal language model based on the Qwen3-0.6B architecture, developed by jspaulsen. It utilizes a custom ~1,800-token UCI tokenizer and is specifically optimized for chess move generation, demonstrating improved move legality and reduced tactical oversights compared to its base model. This model excels at generating chess moves that align with Stockfish's evaluations, making it suitable for chess AI applications requiring strategic move prediction.
Loading preview...
Overview
jspaulsen/halluci-mate-v1c is a 0.8 billion parameter model, a DPO (Direct Preference Optimization) fine-tune of the jspaulsen/halluci-mate-v1b model. It leverages the Qwen3-0.6B architecture and a custom ~1,800-token UCI tokenizer, specifically designed for chess applications. The primary goal of this fine-tuning was to align the model's move generation with the preferences of Stockfish, a strong chess engine, particularly in positions where the base model made suboptimal moves.
Key Capabilities
- Improved Chess Move Quality: Trained to prefer moves endorsed by Stockfish, leading to a reduction in tactical oversights and blunders.
- Enhanced Move Legality: Shows a slight but consistent improvement in generating legal chess moves.
- Specialized Chess Tokenization: Utilizes a custom UCI tokenizer optimized for chess game states.
Training Details
The model was trained using TRL's DPOTrainer on 12,871 preference pairs derived from approximately 11,000 games played between v1b and Stockfish. The training data focused on 'quality' moves, filtering out blunders in already-lost positions and repetitive moves to ensure meaningful learning. The training process involved 2 epochs with a learning rate of 1e-5 and an effective batch size of 64, completing in about 7 minutes on mixed GPU hardware.
Performance Highlights
Evaluations against Stockfish (skill 5, depth 12) over 300 games show v1c achieving a 0.33 percentage point increase in legal move rate and a 1.27 percentage point decrease in tactical oversight rate compared to v1b. While the overall score rate difference was within the noise band for the limited evaluation games, the per-move quality improvements are consistent with the DPO training objectives.
Good For
- Developing chess AI agents that require strategic and legal move generation.
- Research into preference-based learning for game AI.
- Applications needing a compact, specialized model for chess analysis or gameplay.