Overview
This model, clarkkitchen22/pokemon-red-commander-qwen3-4b, is a fine-tuned Qwen3-4B model designed to act as a strategic commander for autonomous Pokemon Red playthroughs. It leverages a 4 billion parameter architecture, merged in 16-bit, and was fine-tuned using QLoRA (4-bit NormalFloat) with Unsloth and Hugging Face TRL.
Key Capabilities
- Strategic Decision-Making: Analyzes game state to make optimal decisions regarding battles, team composition, route planning, and item usage.
- Gen 1 Pokemon Mechanics: Specifically trained on Gen 1 Pokemon Red/Blue data, including Pokedex knowledge, move knowledge, type matchups, and battle strategies.
- Instruction Following: Trained on 903 examples covering 12 categories of Pokemon Red knowledge, formatted as instruction-following conversations.
- Integration Ready: Intended to pair with a Game Boy emulator bridge, a RAG system for detailed knowledge retrieval, and a Telegram bot for monitoring.
Training Details
The model was trained for 3 epochs with a learning rate of 2e-4, achieving a final training loss of 0.22 and an evaluation loss of 0.3049. The dataset was sourced from PokeAPI, covering 151 Gen 1 Pokemon, 165 moves, and 225 type matchups.
Limitations
- Gen 1 Specific: Does not generalize to later Pokemon generations.
- Small Dataset: Trained on a relatively small dataset (903 examples), which may lead to hallucinations on edge cases.
- Specialized: Optimized for strategic game decisions, not general conversation.