GoldenGrapeGentleman1/pokemon-showdown-agent-v6 is a 4 billion parameter Qwen/Qwen3-4B fine-tune, developed by GoldenGrapeGentleman1, specifically designed for next-action prediction from raw Pokemon Showdown replay logs. This model excels at generating precise action commands like 'move Earthquake' or 'switch Corviknight' directly from battle log prefixes. It was fine-tuned using Unsloth and TRL, with a focus on AMD ROCm workflows, and supports a context length of up to 32768 tokens for inference.
Loading preview...
Pokemon Showdown Agent v6 Overview
GoldenGrapeGentleman1/pokemon-showdown-agent-v6 is a specialized 4 billion parameter model, fine-tuned from Qwen/Qwen3-4B, designed to predict the next action in a Pokemon Showdown battle directly from raw replay logs. Unlike models requiring hand-written state summaries, this agent learns from messy, real-world battle data.
Key Capabilities
- Direct Action Prediction: Generates precise, short action commands (e.g.,
move [move-name],switch [pokemon-name]) suitable for agent pipelines. - Raw Log Processing: Learns directly from raw Pokemon Showdown replay logs, simplifying input requirements.
- AMD ROCm Optimized: Developed with AMD ROCm workflows in mind, recommending
bfloat16for stable inference. - Chat-style Prompting: Utilizes a
systemmessage to define the agent's side and ausermessage for the battle log prefix.
Training Details
The model was fine-tuned using LoRA SFT with Unsloth and TRL, leveraging the milkkarten/pokemon-showdown-replays-merged dataset. The training involved 100,000 train games and 10,000 test games, resulting in over 2.3 million training samples. It supports a full training context length of up to 4096 tokens.
Intended Use Cases
- Predicting next actions from raw Pokemon Showdown log prefixes.
- Building text-only battle agents or evaluation harnesses.
- Studying agent alignment from real replay trajectories.
Limitations
This model is a research checkpoint and not a complete battle engine. It may still produce illegal or strategically weak actions, and its reliability is sensitive to prompt wording. It does not include legality checks or full battle-state management.