Pokemon Showdown Agent v6 Overview
GoldenGrapeGentleman1/pokemon-showdown-agent-v6 is a specialized 4 billion parameter model, fine-tuned from Qwen/Qwen3-4B, designed to predict the next action in a Pokemon Showdown battle directly from raw replay logs. Unlike models requiring hand-written state summaries, this agent learns from messy, real-world battle data.
Key Capabilities
- Direct Action Prediction: Generates precise, short action commands (e.g.,
move [move-name], switch [pokemon-name]) suitable for agent pipelines. - Raw Log Processing: Learns directly from raw Pokemon Showdown replay logs, simplifying input requirements.
- AMD ROCm Optimized: Developed with AMD ROCm workflows in mind, recommending
bfloat16 for stable inference. - Chat-style Prompting: Utilizes a
system message to define the agent's side and a user message for the battle log prefix.
Training Details
The model was fine-tuned using LoRA SFT with Unsloth and TRL, leveraging the milkkarten/pokemon-showdown-replays-merged dataset. The training involved 100,000 train games and 10,000 test games, resulting in over 2.3 million training samples. It supports a full training context length of up to 4096 tokens.
Intended Use Cases
- Predicting next actions from raw Pokemon Showdown log prefixes.
- Building text-only battle agents or evaluation harnesses.
- Studying agent alignment from real replay trajectories.
Limitations
This model is a research checkpoint and not a complete battle engine. It may still produce illegal or strategically weak actions, and its reliability is sensitive to prompt wording. It does not include legality checks or full battle-state management.